PDF // free tool

Turn a scanned, image-only PDF into searchable text.
In your browser.

Last reviewed 2026-06-17

Summary

Drop a scanned PDF, run optical character recognition on every page using Tesseract.js, and download the extracted text. Pro adds a searchable PDF output with an invisible text layer behind the original image. Free covers the first 3 pages as plain text.

  • Tesseract.js engine running in your browser.
  • Plain-text download (free) or searchable PDF (Pro).
  • Per-page progress indicator.
  • Free first 3 pages. Pro unlocks unlimited.
Run it
Click or drag a PDF here
Free: first 3 pages as text
Free tier: First 3 pages, text output only
Tools Pro: unlimited pages, searchable-PDF output, no watermark.
Free outputs include a Nuvenar watermark on every page. Pro removes it.
Get Tools Pro
What the number means

What is the Nuvenar OCR PDF tool?

Nuvenar OCR PDF is a free, browser-side optical character recognition tool. It renders each page of your PDF using PDF.js, runs the rendered image through Tesseract.js (a WebAssembly port of the open-source Tesseract OCR engine), and outputs either a plain-text file with the recognised text per page (free) or a searchable PDF with an invisible text layer behind the original image (Pro). Your PDF never leaves your device.

When to use OCR PDF

  • Old scanned files: turn cabinet-archive scans into searchable digital records.
  • Photographed documents: photographed receipts, business cards, or whiteboards into selectable text.
  • Image-only PDFs from email: PDFs forwarded as photos and saved to PDF can be made searchable.
  • Archival projects: bulk-OCR historical document scans so they are searchable in a document-management system.
  • Accessibility: make a scanned document readable by screen readers used by blind and low-vision users.

How OCR PDF works

When you drop a PDF, pdfjs-dist renders each page to a canvas at 2x resolution. The canvas image is passed to Tesseract.js which runs character recognition entirely in WebAssembly inside your browser. On the first run the tool downloads the English traineddata (about 11 MB) from the tessdata CDN; this is cached for subsequent runs. The recognised text is either assembled into a plain-text download or, on Tools Pro, embedded as an invisible text layer behind the original image to produce a searchable PDF.

Searchable PDF vs plain text: which to pick

A searchable PDF is usually more useful. It keeps the original document looking exactly the same (useful for archives, legal records, anything that needs to preserve visual fidelity) and adds the recognised text invisibly behind the image. You can search, copy, and pass it to screen readers. Plain-text output is useful when you just want the text content to paste into Word or a notes app and do not care about the layout.

OCR PDF vs alternatives

ToolBrowser-sideWatermark on freeFree tier limitPaid price
Nuvenar OCR PDFYesText output, no watermarkFirst 3 pages, text only£9/mo unlimited + searchable PDF
Adobe Acrobat ProNo (desktop)No watermarkPaid only£15.17/mo + VAT
ABBYY FineReaderNo (desktop)No watermark30-day trial£159 one-off
OnlineOCR.netNoFree with cap15 pages/hour, no signup$4.95 for 50 pages
Smallpdf OCRNoNo watermark, capped2 tasks/day€9/mo

Privacy and security

OCR is often applied to old paper records, ID documents, contracts, and historical correspondence: the kind of content you would not casually upload to a random web tool. Browser-side processing keeps everything local. The only outbound request is the one-time fetch of the English traineddata file from the public tessdata CDN, which contains no personal data.

Common use cases by profession

  • Solicitors: OCR archived case files for fast keyword search; combine with our redact PDF tool before disclosure.
  • Accountants: OCR scanned bank statements and receipts for digital filing.
  • Researchers and academics: OCR historical document scans for textual analysis.
  • HR teams: OCR old employee records before destruction of paper originals.
  • Healthcare: OCR scanned referral letters and reports into the patient record.
  • Property and conveyancing: OCR scanned title documents and search results.

What is included free vs Tools Pro

Free: first 3 pages of any scanned PDF, plain-text output, no watermark on text. Tools Pro at £9/mo: unlimited pages, searchable-PDF output (with invisible text layer behind the original image), priority support, plus every other Nuvenar PDF tool (merge, compress, unlock) and 25+ business calculators.

Frequently misunderstood things about OCR

  • Myth: all PDFs are searchable. Reality: PDFs created from scans are image-only until OCR is run. You can tell by trying to select text in the viewer.
  • Myth: OCR is 100% accurate. Reality: 95-99% is typical for clear printed text at 300 DPI. Always proof-read for high-stakes uses.
  • Myth: OCR changes the visible document. Reality: searchable-PDF output leaves the original visible scan unchanged and adds an invisible text layer behind it.
  • Myth: OCR works on handwriting. Reality: traditional OCR (Tesseract included) is built for printed text. Handwriting needs specialised cloud OCR.
FAQ

How do I OCR a scanned PDF for free?

Drop the scanned or image-only PDF, click Run OCR, and the tool runs Tesseract.js page by page in your browser. Free tier covers the first 3 pages and outputs a plain-text file. Tools Pro at £9/mo unlocks unlimited pages plus the searchable-PDF output option.

Is this OCR tool really free?

Yes. Free covers the first 3 pages of any scanned PDF and outputs a .txt file. Tools Pro at £9/mo unlocks unlimited pages and the searchable-PDF output. Adobe Acrobat Pro has OCR built in at £15.17/mo plus VAT. ABBYY FineReader is the desktop gold standard at £159 one-off.

What is OCR and when do I need it?

Optical Character Recognition turns an image of text into actual text characters. You need OCR when a PDF was created by scanning paper or photographing a screen and the text is locked inside the image. You can tell because you cannot select or copy the text in the PDF viewer. OCR makes the document searchable and editable.

What is a searchable PDF and how is it different from the .txt output?

A searchable PDF keeps the visible scan image exactly as it looks, but adds an invisible text layer behind it. The document looks the same but you can now select text, copy passages, search with Ctrl-F, and screen readers can read it. A .txt output extracts just the text and discards the image. Free tier gives .txt, Pro gives both.

Which languages does the OCR support?

English (eng) by default. Tesseract.js supports 100+ languages including Welsh, French, German, Spanish, Italian, Polish, and Arabic, available as additional traineddata files. The browser bundles English at first load. Multi-language support is on the roadmap as a Pro feature.

Is my PDF uploaded when I OCR it?

No. PDF rendering uses PDF.js and OCR runs entirely in your browser via Tesseract.js (a WebAssembly port). Your scanned document never leaves your device. The first run downloads the English traineddata (about 11 MB) from the tessdata.projectnaptha.com CDN; the document itself stays local.

How accurate is the OCR?

Tesseract.js typically achieves 95-99% character accuracy on clear, well-scanned printed text at 300 DPI. Accuracy drops on photos of paper, low-resolution scans, complex layouts, handwriting, and skewed pages. For business-grade legal or financial OCR with table recognition, ABBYY FineReader and Adobe Acrobat Pro are still better.

How long does OCR take?

Roughly 2-5 seconds per page on a modern laptop for clear A4 text at 300 DPI. Larger pages, higher resolution, and slower devices take longer. The progress indicator shows page-by-page status. First run also downloads the English traineddata (about 11 MB).

How is this different from Adobe Acrobat Pro, ABBYY, and online OCR sites?

Adobe Acrobat Pro and ABBYY FineReader are paid desktop tools with higher accuracy and table recognition. Online OCR sites like onlineocr.net and Smallpdf upload your PDF to their servers. This tool runs in your browser, free, with a respectable accuracy level for clear scans.

Does it work on handwritten notes?

Poorly. Tesseract is trained primarily on printed text. Handwritten OCR needs specialised models like Google Cloud Vision or AWS Textract. Free Tesseract.js gives best-effort output you may need to clean up manually.

Can I OCR a password-protected PDF?

No. Unlock it first with our unlock PDF tool. The pdf-lib library that reads the file in does not handle encrypted source files.

Does the searchable PDF output have a Nuvenar watermark?

Searchable PDF output is Tools Pro only and has no watermark. The free-tier .txt output has no watermark either (text files cannot carry one). If you generate other Nuvenar PDF outputs on free, those carry the small Nuvenar marker.

Does this work on mobile?

Yes, but OCR is CPU-heavy and slow on mobile. For documents over 5 pages, expect 30+ seconds. Desktop is faster.

Is the free version enough or do I need Tools Pro?

Three pages of plain-text output is enough for one-off short documents. If you OCR multi-page documents regularly, want searchable PDFs (the more useful output for archival and search), or need bulk processing, Tools Pro at £9/mo is the better tier.

OCR more than 3 pages at a time?

Tools Pro at £9/mo unlocks unlimited pages, the searchable-PDF output (more useful for archives), and gets you every other Nuvenar PDF tool plus 25+ business calculators.

See Tools Pro