Reputation: 23
I have a file with the file hierarchy and its corresponding CRC32 code:
Folder A\Folder C\File three.txt 56efd95f
Folder A\File one.txt b8e1b873
Folder A\Folder B\Folder D\File four.txt 56efd95f
Folder A\Folder B\File two.txt 21e8e9c9
I am using notepad++ and I need to know a regular expression capable of finding rows with the same CRC32. In this example I expect to find line 1 and line 3.
I know with \s[a-zA-Z0-9]{8,8}$
I can Match the CRC32 but how can I check if these matches are repeated?
Moreover if I wanted remove everything but the CRC32, why is not working the expression .*(?!\s[a-zA-Z0-9]{8,8}$)
to replace the matches with an empty string and get a clean list of CRC32?
Upvotes: 2
Views: 10716
Reputation: 18611
To find the duped repetitions:
(?s)\h([a-zA-Z0-9]{8})$(?=.*\h\1$)
See proof.
To remove all but the CRC32 codes:
.*\h([a-zA-Z0-9]{8})$
Replace with $1
. See another proof. Then, Edit -> Line Operations -> Sort Lines Lexicographically Ascending and after that Remove Consecutive Duplicate Lines.
Explanation
--------------------------------------------------------------------------------
\h horizontal whitespace
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[a-zA-Z0-9]{8} any character of: 'a' to 'z', 'A' to
'Z', '0' to '9' (8 times)
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
$ end of a line
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
\h horizontal whitespace
--------------------------------------------------------------------------------
\1 what was matched by capture \1
--------------------------------------------------------------------------------
$ end of a line
--------------------------------------------------------------------------------
) end of look-ahead
Upvotes: 1
Reputation: 20737
You can use something like:
/([\da-f]{8}$)(?=.*\1)/gms
([\da-f]{8}$)
- find a CRC code(?=.*\1)
- make sure the CRC code appears againhttps://regex101.com/r/fpIOCN/1
In Notepad++ just make sure to enable ". matches newline"
Upvotes: 2