Duplicate Line Remover
Remove duplicate lines from any text. Sort, count, and clean text instantly. Free browser-based tool.
Advertisement
1 lines
1 lines
Advertisement
How to Use This Tool
Paste Your Text
Paste your list, CSV data, code, or any line-by-line text content into the input box.
Choose Options
Select options: case-sensitive matching, trim whitespace, sort alphabetically, or show occurrence counts.
Copy Clean Output
The cleaned output with duplicates removed appears instantly. Click Copy to use it.
Advertisement
Related Tools
Frequently Asked Questions
Is the duplicate removal case-sensitive?
What is the typical use case for this tool?
Is there a line limit?
Can it count how many times each line appears?
About Duplicate Line Remover
A data analyst exported a CSV of 12,000 customer email addresses from three marketing platforms, concatenated the files with cat, and now has a list with roughly 30 percent overlap that will trigger Mailchimp's duplicate-import warning and eat into the monthly sending quota. A developer scraped a list of 4,000 URLs from multiple sitemaps and needs the unique set before running a broken-link check. This deduplicator reads your pasted text line by line, preserves the original order of first occurrence, and strips subsequent duplicates — matching the behavior of Unix's 'awk !seen[$0]++' one-liner but running inside your browser without needing a terminal. It offers case-sensitive and case-insensitive comparison modes (so 'User@Example.com' and 'user@example.com' can be treated as the same or different depending on your downstream system), optional whitespace trimming, and a counter showing how many lines were kept versus removed. Nothing leaves the page, which matters when the list contains customer PII, API keys, or any other data you would not paste into a random third-party website.
When to use this tool
Deduping an email list before import
Three marketing platforms exported 12,000 email addresses with overlap. Paste, set case-insensitive mode so 'Jane@Example.com' and 'jane@example.com' collapse into one entry, then import the cleaned list into Mailchimp or Klaviyo without paying for duplicate contacts counted against your plan quota.
Consolidating scraped URLs from multiple sitemaps
A crawler pulled URLs from three sitemap.xml files plus a homepage link extraction, totaling 4,000 URLs with significant overlap. Dedupe before running Screaming Frog or a broken-link checker to avoid checking the same URL three times and burning through rate limits.
Cleaning up SSH authorized_keys files
You collected SSH public keys from five team members who submitted via chat, Slack, and email. The resulting authorized_keys file has the same key pasted twice from two sources. Dedupe before committing to Ansible config management so audit logs stay clean.
Building a unique word list from a document dump
Extracting unique vocabulary from a 50,000-word translation project corpus. Split the text into lines first (one word per line, done with a text transformation), then dedupe to get the unique token list for a translation memory or glossary.
Removing duplicate error log entries
A production log file contains the same error message repeated 500 times during a 5-minute incident. Dedupe to get the unique error signatures for root-cause analysis, then cross-reference the first occurrence timestamp against your deployment log to identify the triggering change.
How it works
- 1
Single-pass Set-based deduplication
Input is split on newlines, then iterated once. Each line is added to a JavaScript Set; if the line is already in the Set, it is skipped. The kept lines are joined back with newlines in original order. Time complexity is O(n) and memory is O(unique lines), so a million-line input runs in well under a second on a modern laptop.
- 2
Case sensitivity toggle affects comparison, not output
In case-insensitive mode, the Set key is the lowercased version of the line, but the kept output preserves the original casing of the first occurrence. So 'User@Example.com' would be kept and 'user@example.com' skipped if they appear in that order — downstream systems may still validate email addresses case-sensitively even if SMTP treats them as equivalent.
- 3
Whitespace trimming applied before comparison if enabled
When the trim option is on, leading and trailing whitespace is stripped before computing the Set key, so 'foo' and ' foo ' collapse to the same entry. The kept output preserves the original line exactly as pasted, whitespace included, which matters if your downstream system requires trailing commas or specific indentation.
Pro tips
Decide whether email case matters for your downstream
The SMTP spec (RFC 5321) says the local part of an email (before @) is technically case-sensitive, but in practice every major provider (Gmail, Outlook, Yahoo, iCloud) treats it as case-insensitive. Postfix, Sendmail, and most ESP platforms match on lowercased addresses. So for 99 percent of real-world email workflows, dedupe case-insensitively. The exception: some corporate mail systems (rare, legacy) and custom business logic in CRMs that keys on exact string match. When in doubt, check your CRM's de-duplication settings before importing.
Sort before dedupe when first-occurrence order does not matter
Our dedupe preserves first-occurrence order, which is the right default for data where order has meaning (log files, timeline entries, URL extraction). For data where order is arbitrary (alphabetical lists, unique word extractions, sorted imports), paste a pre-sorted list or sort separately after deduping to get a canonical alphabetical order. Sorted-then-deduped output is easier to diff between runs and catches accidentally-reintroduced duplicates in future imports more reliably than order-preserved output.
For huge files, prefer the command line
The browser is fine for a million-line input, but if you are routinely deduping 10-million-line CSVs or gigabyte-scale log files, use the shell. 'awk !seen[$0]++ file.txt' does the same thing in constant memory relative to the number of unique lines, streams through the file without loading it all, and finishes in well under a minute. 'sort -u file.txt' is even faster when order does not matter. A browser tab handling a gigabyte of text will stutter, consume several gigabytes of RAM, and sometimes crash.
Frequently asked questions
Does deduplication preserve the original order of lines?
Yes. The algorithm walks the input top to bottom and keeps the first occurrence of each unique line while discarding subsequent duplicates. The relative order of kept lines matches their order in the input. This matters for log files, timeline data, and any list where position carries meaning. If you want alphabetically sorted output instead, sort the result separately (most editors have a sort-lines command) or paste pre-sorted input. For use cases where order is arbitrary, sorting first has a small performance benefit and makes diffs between runs more stable.
What is the difference between case-sensitive and case-insensitive mode?
In case-sensitive mode (default), 'User' and 'user' are treated as distinct lines and both are kept. In case-insensitive mode, they are treated as duplicates and only the first occurrence is kept, preserving its original casing. Choose based on your downstream system: email providers, DNS names, and most modern databases treat identifiers as case-insensitive, so deduping email lists or domain names case-insensitively makes sense. Unix filenames, URLs with query parameters, and content-comparison use cases are case-sensitive — dedupe preserving case to avoid collapsing distinct entries into one.
Can this handle CSV files with headers or JSON Lines format?
It treats all input as line-delimited text, so a CSV with a header row will dedupe correctly as long as no data row happens to exactly match the header (extremely unlikely). JSON Lines (one JSON object per line) also dedupes correctly — two lines with the same JSON object string will collapse. The limitation is that JSON objects with the same semantic content but different key order ('{a:1,b:2}' versus '{b:2,a:1}') are treated as different lines because the dedupe is string-based, not structure-aware. For structural JSON dedupe, pre-normalize the JSON with key sorting via the json-formatter tool before pasting here.
Why are my 'identical' lines not being removed?
Usually one of: invisible whitespace differences (trailing spaces, tab versus space indentation, BOM characters, non-breaking spaces instead of regular spaces), different line endings (Windows CRLF versus Unix LF), case differences if case-sensitive mode is active, or Unicode normalization differences (é as one character versus e plus combining accent). Turn on whitespace trimming to handle the first two, switch to case-insensitive for the third, and for Unicode issues the only fix is to normalize input through a Unicode NFC normalizer before pasting. The word-counter or text-case-converter tools can sometimes help surface hidden whitespace problems.
Is the data I paste sent to any server?
No. Deduplication runs entirely in JavaScript inside your browser tab using a native Set data structure. There is no fetch, no server call, no logging of content. The input never leaves the page. This matters when the list contains PII (email addresses, customer names, phone numbers), authentication material (SSH keys, API tokens, passwords), or any data subject to privacy regulations (GDPR, HIPAA, CCPA). You can verify this in your browser's developer tools — the Network tab will show zero activity during deduplication. For maximum paranoia, disconnect from Wi-Fi before pasting; the tool continues to work identically.
Honest limitations
- · Deduplication is line-based — two lines with the same content but different whitespace (or different line endings on pasted Windows-vs-Unix files) will be treated as different unless whitespace trimming is on.
- · No fuzzy matching — 'user@example.com' and 'user@Example.com' collapse only if case-insensitive mode is on; 'user@example.com' and 'user@examplecom' stay separate.
- · Large inputs (over roughly 5 million lines or 500 MB) can freeze the browser tab; for heavy batch processing, use a command-line tool instead.
Deduplication commonly sits in a data-cleaning workflow. The text-case-converter is useful for normalizing case before or after dedupe, especially when downstream systems are mixed case-sensitive and case-insensitive. The word-counter confirms the unique-line count and helps estimate impact (how many duplicates were actually in the original list). For developers cleaning structured data, the json-formatter normalizes object keys before dedupe of JSON Lines files, and the regex-tester builds the pattern for line-filtering steps that often precede deduplication in a pipeline.
Advertisement