Reputation: 1297
I have a massive text document (using VS Code) that looks like this and continues in the same pattern for several thousand lines. In essence, we have a integer, a float that always starts with 0.00 and then four blank lines:
468653564
0.0013348548
160919876
0.0015948548
239109587
0.0010948548
190959199
0.0023948548
163220290
0.001348548
How would I format this document to remove the blank lines and the float, so I end up with something that looks like this:
468653564
160919876
239109587
190959199
163220290
This pattern seems to work fine for the first step (0.00.*)
and this ^$\n
for the second, but is there a way to get it all in one fell swoop?
Upvotes: 3
Views: 4640
Reputation: 47302
To handle multiple regex patterns in one go simply include an "or" statement (|
) separating them:
0\.00.*\n|^$\n
So this essentially says look for 0.00... OR blank lines.
A slightly more efficient pattern might be to look for digits \d
(without being specific to which ones) followed by a period then additional digits, as it should take less steps:
^(\n|\d\.\d+\n)
Upvotes: 2
Reputation: 3039
You can make the search for the line breaks be optionally greedy:
0\.00\d+(\r?\n)*
The star modifies the group to be "zero or more". This matches the missing line breaks at the end of the data as well as the line breaks you want to remove. The \r
is marked optional just to account for differences in Unix-style vs Windows-style. The rest of the pattern is pretty much as written: find a zero followed by a decimal point followed by a double-zero followed by one or more (+
) digits followed by the optional line breaks.
Upvotes: 1
Reputation: 371213
One possibility is
^(?!\d{2}).*\n
and replace with the empty string. It matches all lines that don't start with 2 digits.
Upvotes: 1