cballes
cballes

Reputation: 23

Find duplicate words in notepad++

I have a file with the file hierarchy and its corresponding CRC32 code:

Folder A\Folder C\File three.txt         56efd95f
Folder A\File one.txt                    b8e1b873
Folder A\Folder B\Folder D\File four.txt 56efd95f
Folder A\Folder B\File two.txt           21e8e9c9

I am using notepad++ and I need to know a regular expression capable of finding rows with the same CRC32. In this example I expect to find line 1 and line 3.

I know with \s[a-zA-Z0-9]{8,8}$ I can Match the CRC32 but how can I check if these matches are repeated?

Moreover if I wanted remove everything but the CRC32, why is not working the expression .*(?!\s[a-zA-Z0-9]{8,8}$) to replace the matches with an empty string and get a clean list of CRC32?

Upvotes: 2

Views: 10716

Answers (2)

Ryszard Czech
Ryszard Czech

Reputation: 18611

To find the duped repetitions:

(?s)\h([a-zA-Z0-9]{8})$(?=.*\h\1$)

See proof.

To remove all but the CRC32 codes:

.*\h([a-zA-Z0-9]{8})$

Replace with $1. See another proof. Then, Edit -> Line Operations -> Sort Lines Lexicographically Ascending and after that Remove Consecutive Duplicate Lines.

Explanation

--------------------------------------------------------------------------------
  \h                       horizontal whitespace
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [a-zA-Z0-9]{8}           any character of: 'a' to 'z', 'A' to
                             'Z', '0' to '9' (8 times)
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  $                        end of a line
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
    \h                       horizontal whitespace
--------------------------------------------------------------------------------
    \1                       what was matched by capture \1
--------------------------------------------------------------------------------
    $                        end of a line
--------------------------------------------------------------------------------
  )                        end of look-ahead

Upvotes: 1

MonkeyZeus
MonkeyZeus

Reputation: 20737

You can use something like:

/([\da-f]{8}$)(?=.*\1)/gms
  • ([\da-f]{8}$) - find a CRC code
  • (?=.*\1) - make sure the CRC code appears again

https://regex101.com/r/fpIOCN/1

In Notepad++ just make sure to enable ". matches newline"

enter image description here

Upvotes: 2

Related Questions