Image to Text Converter

Extract text from images free using OCR. Supports English, French, Spanish, Chinese and 8 more languages.

✓ Free✓ No sign-up✓ Works in browser

Advertisement

Drop an image to extract text

Screenshots, scanned docs, photos with text

Upload Image

PNG, JPG, WEBP, BMP, TIFF

🔒 Private & Local

OCR runs entirely in your browser. No image ever leaves your device.

🌍 11 Languages

English, French, Spanish, German, Chinese, Japanese, Korean, Arabic, Hindi and more.

📋 Edit & Export

Edit the extracted text and download as .txt or copy to clipboard.

Advertisement

How to Use This Tool

1

Upload Your Image

Upload a JPEG, PNG, or WebP image containing text. Works on screenshots, photographed documents, scanned pages, and signs.

2

Select Language

Choose the primary language of the text in your image. Supporting the correct language improves recognition accuracy.

3

Copy Extracted Text

The extracted text appears in the editable output panel. Click Copy to copy all text to your clipboard.

Advertisement

Related Tools

Frequently Asked Questions

What languages does the OCR tool support?
English, Spanish, French, German, Italian, Portuguese, Dutch, Russian, Chinese (Simplified), Japanese, and Korean.
How accurate is the text extraction?
For clear, printed text in good lighting, accuracy is typically 95–99%. Handwritten text, very small fonts, or poor image quality reduce accuracy significantly.
Can it extract text from photos taken on a phone?
Yes, as long as the image is in focus, adequately lit, and the text is legible. Slightly angled photos also work — the OCR handles minor perspective distortion.
Can it extract text from tables?
The tool extracts text content from tables but does not preserve the table structure. For table extraction with formatting, a dedicated PDF/document OCR tool gives better results.

About Image to Text Converter

You just got a photo of a whiteboard from a meeting you missed, a screenshot of a spec PDF that some vendor sent as a raster image, or a scan of a vintage recipe your grandmother wrote in 1974 and the handwriting is barely legible. This tool runs Tesseract.js (the WebAssembly port of Google's Tesseract OCR engine, version 5) entirely in your browser to pull text out of images — no upload, no API key. It supports English out of the box with trained data for 100+ additional languages loadable on demand, multi-column layout detection, and a preview of the bounding boxes Tesseract identified so you can tell at a glance when it missed a block. Be honest about expectations: OCR is strong on clean printed text at 300 DPI, decent on screen screenshots, flaky on low-resolution phone photos shot at an angle, and genuinely bad on stylized fonts or cursive handwriting. A pre-processing toggle (contrast boost, grayscale, deskew) helps on marginal inputs.

When to use this tool

Extracting text from a whiteboard photo

After a planning meeting, someone took a photo of the whiteboard rather than transcribing. Upload, let Tesseract extract the block of text (it handles print reasonably but stumbles on small rushed writing), and you have something grep-able to drop into a Notion doc instead of retyping two pages of decisions.

Getting text out of a scanned invoice

A supplier sent an invoice as a scanned PDF where the pages are raster images with no selectable text. Run each page image through OCR, paste the output into a spreadsheet, and reconcile line items without manual retyping. Accuracy is typically 95%+ on clean 300 DPI scans of standard invoice templates.

Converting a meme or social screenshot into quotable text

You want to reference a quote from a Twitter screenshot or a meme in a blog post, but the text is baked into the image. OCR pulls it out in a few seconds and you get plain text you can copy, cite, and fact-check — more reliable than squinting at the image and retyping.

Reading a legacy scanned book chapter

Google Books blocks downloading and your university library only has the print edition. Photograph the relevant pages, run OCR, and you get a searchable text dump of the chapter. Expect 1–5% character error rate on typical book typography, which is usually fine for skimming but needs proofreading before citation.

Extracting code from a screenshot

A Stack Overflow answer, a tweet, or a textbook showed a code snippet as a screenshot. OCR pulls the code as text; expect to fix a few OCR confusions (l vs 1, O vs 0, rn vs m) but usually ~90% of a ten-line snippet comes through cleanly enough to be faster than retyping.

How it works

  1. 1

    Tesseract 5 compiled to WebAssembly

    We ship Tesseract.js, the JavaScript port of Google's Tesseract v5 engine compiled to WASM with SIMD enabled. The engine uses an LSTM-based recognizer trained on millions of pages of text; the English model (eng.traineddata, around 4MB) is bundled and loaded on the first OCR request. Additional language models download from a CDN on demand; the full 100+ language set totals a couple hundred MB so we do not pre-bundle all of them.

  2. 2

    Pre-processing pipeline before recognition

    Before handing the image to Tesseract we apply an optional pre-processing chain: grayscale conversion, adaptive thresholding (Otsu's method) for binarization, and deskew via Hough-line detection if the text angle is off by more than a degree. These steps typically improve accuracy by 10–20% on phone photos; they can occasionally hurt clean screenshots where the color anti-aliasing carries useful signal. The toggle lets you skip pre-processing for already-clean inputs.

  3. 3

    Layout analysis detects columns and blocks

    Tesseract's page segmentation mode (PSM) controls how it breaks the image into recoverable regions. We default to PSM 3 (automatic page segmentation) which works for most documents; for a single line of text (like a screenshot of a URL) PSM 7 is faster and more accurate; for a single word PSM 8. The tool exposes these as presets — Document, Line, Word — so you can hint Tesseract for better results on non-standard layouts.

Honest limitations

  • · Handwriting accuracy ranges from mediocre (clear block print) to unusable (cursive or rushed script); OCR engines were trained primarily on printed text and handwriting remains a hard open research problem.
  • · Stylized, decorative, or heavily-compressed fonts (logos, chrome text effects, low-resolution meme screenshots) produce error rates well above 10% and sometimes fail to recognize words entirely.
  • · Non-Latin scripts require downloading separate trained data files (tens of MB per language) and may lag commercial cloud OCR in accuracy, especially for CJK characters and languages with complex ligatures.

Pro tips

300 DPI is the quality threshold — anything less drops accuracy sharply

Tesseract was tuned for scanned printed pages at roughly 300 DPI, which translates to characters that are around 20–30 pixels tall in the image. Below that (say, a phone photo of distant text where characters are 10 pixels tall), the LSTM has too little signal per glyph and error rates spike from 2% to 10%+. If you control the capture, photograph closer and framed tighter on the text rather than distant and cropping later. For a document scanned on a multifunction printer, set the driver's DPI to 300 or 400 rather than the default 150.

English works best; non-Latin scripts vary widely

The bundled English model is well-trained and accurate. Other Latin-script languages (French, Spanish, German, Italian) are nearly as good. Cyrillic, Greek, and Arabic are decent but occasionally miss diacritical marks or ligatures. Chinese, Japanese, and Korean (especially traditional characters) are weaker in Tesseract than in commercial cloud OCR (Google Cloud Vision, AWS Textract). Handwriting recognition in any language is significantly worse than printed text, and cursive or stylized fonts are genuinely hard regardless of language. Benchmark on a small sample before committing to a batch.

Proofread the output, especially digits and similar glyphs

OCR errors cluster on specific confusions — lowercase l and digit 1, uppercase O and digit 0, rn and m, cl and d, 5 and S. These are the glyphs that are genuinely ambiguous without context. After OCR, run a pass where you grep the output for anything that looks wrong and check against the source image. For text destined for structured data (invoice amounts, product codes, phone numbers), always proofread the digits — a single misread in an invoice total can cause real problems downstream.

Frequently asked questions

Why is the OCR making so many mistakes on my phone photo?

A few common reasons: the photo resolution is too low (characters under 20 pixels tall are hard for Tesseract), the lighting is uneven (shadows create false text edges the engine mistakes for noise), the angle is off by more than 5 degrees (the deskew pre-processing handles small angles but not heavy perspective), or the text has stylized or very small fonts that fall outside the training distribution. For best results, photograph directly overhead, fill most of the frame with the text region, use even indirect lighting without shadows, and shoot at the highest resolution your phone offers. A typical phone at 12MP shooting a letter-sized page from 30cm away produces a 300 DPI image that OCRs well.

Can this handle handwriting?

Poorly, honestly. Tesseract was trained primarily on printed text, and handwriting — with its variability in stroke, slant, connectivity, and character shapes — falls outside the training distribution. Clear block-print handwriting (each letter separated, uppercase, consistent size) works at maybe 70–80% accuracy on a good day. Cursive or casual handwriting drops to 30–50% or fails entirely. Dedicated handwriting OCR (Google Cloud Vision's handwriting mode, Microsoft Azure Read API) is significantly better because it uses models specifically trained on handwriting, but no current tool — including commercial — handles cursive reliably.

Which languages are supported?

English ships bundled. Tesseract 5 supports 100+ additional languages via downloadable trained data files, including all major European languages, Cyrillic scripts, Arabic, Hebrew, Chinese (Simplified and Traditional), Japanese, Korean, Thai, Hindi, and many more. To use a non-English language, select it from the language dropdown; the trained data file (typically 10–50MB per language) downloads on first use and caches for the session. Accuracy varies significantly — well-trained languages with large training corpora (French, Spanish, German) are near English quality; smaller languages and complex scripts are weaker.

Is my image sent to a server for OCR?

No. Tesseract.js runs entirely in your browser via WebAssembly. When you upload an image, it is decoded into a canvas, the pixel data is handed to the WASM-compiled Tesseract engine, and recognition happens locally. No network request sends any image data to any server. The only network activity during OCR is the initial download of language trained data files from a CDN (cached after first use) and standard page analytics that never see DOM or image content. This matters because people often OCR confidential documents — invoices with bank details, ID scans, internal meeting notes — where server-side OCR would be a compliance problem.

How fast is the recognition?

Depends on image size and content density. A typical letter-sized document at 300 DPI (around 2500x3300 pixels) with roughly 500 words of text takes around 4–8 seconds on a modern laptop. A single-line screenshot of a URL takes under a second. A 10-page scanned invoice PDF (processed one page at a time) takes around a minute. The first run of a session is slower because the trained data file has to download and the WASM module warms up; subsequent runs are noticeably faster. For batch processing of hundreds of documents, a server-side OCR pipeline with GPU acceleration will be 10–50x faster, but for ad-hoc work the browser version is usually sufficient.

OCR is typically a stepping stone in a larger document workflow. After extraction you often need to clean the text in a writing tool, count words for a summary, or paste the output into a structured document. For documents that are already PDFs with raster pages, pdf-to-jpg extracts the pages as images first, then feed them through this tool. If the source was originally a Word document that was flattened to image and you need editable Word output, pdf-to-word handles searchable PDFs better than OCR-then-reassemble. image-compressor and image-resizer help if the source is too large for Tesseract's working resolution.

Advertisement