xcyteh
xcyteh

Reputation: 67

Notepad++ deleting lines containing duplicate words

I have a .txt document which consists of one word followed up with a date in one line, and so on in each line.

How can Notepad++ recognize same words in different lines and delete duplicate lines?

Upvotes: 5

Views: 35639

Answers (4)

SamYonnou
SamYonnou

Reputation: 2068

Assuming the dates can be different for the same occurrence of the same word and you want to keep the one that appears first in the file then this should work (make sure your file end with a new line for this):

  1. Go to the "Replace" dialog (you can do Ctrl+F and go to replace tab).
  2. In the "Search Mode" at the bottom select "Regular expression" (make sure ". matches newline" is not selected).
  3. In the "Find what:" field type (\s*\w+ )(.*\r\n)((.*\r\n)*)\1.*\r\n
  4. In the "Replace with:" field type \1\2\3
  5. Click "Replace" until there are no more occurrences ("Replace All" does not seem to work for this, and perhaps there exists a better regex for which it will work, but I have not found it).

I've tested this on the file:

testing330     05:09-24/08
whatever     10:55-25/08
testing     15:57-26/08
testing667     19:22-30/08
linux     00:29-31/08
testing330     00:29-31/08
windows     12:25-31/08

And the result was:

testing330     05:09-24/08
whatever     10:55-25/08
testing     15:57-26/08
testing667     19:22-30/08
linux     00:29-31/08
windows     12:25-31/08

Upvotes: 4

lynx_74
lynx_74

Reputation: 1761

You can use EditPlus on Windows OR TextWrangler on Mac to sort and remove duplicated lines easy.

After Notepad++ 6.5.2 (free) you can sort lines OR you can install the plugin "TextFX Characters" using the "Plugin Manager".

TextFX includes numerous features to transform selected text. Featuring: * Interactive Brace Matching * Quote handling * Character case alternation * Text rewrap * Column Lineup * Fill Text Down * Insert counter text down * Text to code conversion * Numeric Conversion * URI & HTML encoding * HTML to text conversion * Submit text to W3C * Text sorting * Ascii Chart * Leading whitespace repair * Autoclose HTML & braces Homepage: http://textfx.no-ip.com/textfx/

Upvotes: 2

alexjhart
alexjhart

Reputation: 86

Not a direct answer to your question, but I found this article based on the title. I was looking to just delete duplicate lines. I found an easy way to do that here

  1. Mark all the text (CTRL+A). Click TextFX → Click TextFX Tools → Check +Sort outputs only UNIQUE (at column) lines (if not already checked).
  2. Click TextFX → Click TextFX Tools → Click Sort lines case insensitive (at column)

Upvotes: 7

optimistictoaster
optimistictoaster

Reputation: 61

For me personally, here are the steps I follow. Let's assume you have only 1 column of data in column A.

  1. Import the data into Excel.
  2. Sort the data.
  3. Insert a function to check for duplicates. Cell B2 would be: =IF(A2=A1,"Duplicate","")
  4. Select all of column B.
  5. Copy.
  6. Paste special and paste the values.
  7. Sort the data according to column B.
  8. Delete all the ones marked with "Duplicate".
  9. Copy the data back to Notepad++

I thought there was a plugin like this, but can't find it now. Otherwise, this link may help you.

Upvotes: 1

Related Questions