Reputation: 1069
I have text on new lines like so:
tom
tim
john
will
tod
hello
test
ttt
three
I want to delete every third line so using the example above I want to remove: john,hello,three
I know this calls for some regex, but I am not the best with it!
What I tried:
Search: ([^\n]*\n?){3} //3 in my head to remove every third
Replace: $1
The others I tried were just attempts with \n\r
etc. Again, not the best with regex. The above attempt I thought was kinda close.
Upvotes: 6
Views: 7681
Reputation: 91385
This will delete every third line that may contain more than one word.
(?:[^\r\n]+\R){2}\K[^\r\n]+(?:\R|\z)
LEAVE EMPTY
Explanation:
(?: # start non capture group
[^\r\n]+ # 1 or more non linebreak
\R # any kind of linebreak (i.e. \r, \n, \r\n)
){2} # end group, appears twice (i.e. 2 lines)
\K # forget all we have seen until this position
[^\r\n]+ # 1 or more non linebreak
(?: # start non capture group
\R # any kind of linebreak (i.e. \r, \n, \r\n)
| # OR
\z # end of file
) #end group
Result for given example:
tom
tim
will
tod
test
ttt
Screen capture:
Upvotes: 10
Reputation: 49
Another way, you can use the plugin ConyEdit to do this. Use the command line cc.dl 3.3
to delete the third line of each group, 3 lines for each group.
Upvotes: 0
Reputation: 47894
Since the OP says Sahil's answer "worked like a charm" I'll assume the text in notepad++ ended with a newline character. Otherwise, Sahil's and Toto's answers will fail to match the final set of words.
Sahil's pattern: (.*?)\n(.*?)\n(.*)\n
takes 79 steps *if the text ends in \n
; otherwise 112 steps and fails.
His replacement expression needlessly uses two capture group references.
Toto's pattern: ((?:[^\r\n]+\R){2})[^\r\n]+\R
takes 39 steps *if the text ends in \n
; otherwise 173 steps and fails.
His replacement expression uses one capture group reference.
My suggested pattern will take only 25 steps and uses no capture groups. Your text is a series of non-white characters followed by white characters and so the following is the shortest, most accurate pattern which provides maximum speed:
\S+\s+\S+\s+\K\S+\s*
This pattern should be paired with an empty replacement.
\S
means "non-white-space character"
\s
means "white-space character"
+
means one or more of the preceding match
*
means zero or more of the preceding match
\K
means Keep the match starting from here
The *
on the final \s
allows the final 3 lines of text to conclude without a trailing newline character. When doing this kind of operation on a big batch of text, it is important to be sure that the replacement is working properly on the whole text and no undesired substrings remain.
While I'm sure you've long forgotten about this regex task, it is important that future readers benefit from learning the best way to achieve the desired result.
Upvotes: 1
Reputation: 15141
gedit ubuntu
Search for: (.*?)\n(.*?)\n(.*)\n
Replace with: \1\n\2\n
Upvotes: 7