Ross Walker
Ross Walker

Reputation: 31

How to find all lines with character occuring more than x times, Notepad++?

I have what most of you will probably consider a trivially easy problem to solve.

I have a list which is formatted like so;

?d?d?d?d?l?d?d?d
?d?d?d?d?l?d?d?l
?d?d?d?d?l?d?d?u
?d?d?d?d?l?d?l?d
?d?d?d?d?l?d?l?l

There are many tens of thousands of lines like this. I would like a regular expression that will select all of the lines that contain more than 5 occurrences of the letter d so they can be removed from the list.

Despite searching extensively, I have not found a solution that works. I've found many ways of searching for occurrences of characters etc. on this and other forums (including spaces and special characters), and have been able to successfully conduct the search on other lists of words but I think the presence of all those question marks screws it up... I can't say for sure though.

I apologise in advance if I somehow missed a post which explains this perfectly, but I have made an effort to find a solution on my own and have just become exasperated with it.

Many thanks in advance for any help provided!

Upvotes: 2

Views: 4916

Answers (2)

NetMage
NetMage

Reputation: 26917

Use Find and Replace and replace lines matching the following with nothing:

^.*(d.*){6}.*\r\n

Explanation:

  • ^ - start at beginning of a line
  • .* - skip 0 or more uninteresting characters but not past end of line
  • (d.*) - find d followed by any uninteresting characters but not past end of line
  • {6} - repeat the last group exactly six times (so 6 (d followed by anything)) (use {6,} for 6 or more)
  • .* - match any remaining characters up to end of line
  • \r\n - match end of line sequence (as pointed out by @toto, better as \R)

See https://regex101.com/r/UaYAz4/1

Update based on comments:

^[^d\r\n]*(d[^d\r\n]*){6,}.*\R

Upvotes: 3

3ocene
3ocene

Reputation: 2210

Without having notepad++ installed, I would expect ^.*(?:d.*){5,}$ (regex101) to do what you want:

  • ^ start at the start of the line.
  • .* match any character 0 or more times. (Ignore any characters that aren't d at the start of the line.)
  • (?:d.*){5,} Match the following 5 or more times:
    • d the letter d.
    • .* any character 0 or more times. (Ignore any characters between the ds.)
  • $ match the end of the line.

As NetMage pointed out, this will leave blank lines. To fix this, use \r?\n instead of $. This matches:

  • \r? an optional carriage return if using windows.
  • \n a line break.

Upvotes: 1

Related Questions