Reputation: 13
First of all, I know how to solve this using two regex expressions but I was wondering if this can be done using just one. Please see this Regex101.com example for the following explanation.
Here's what I'm trying to do: I am given a .csv file, of which one line looks like this:
AAA,AAA,AAA,AAA,some text and a comma here, and there, test,,,,,,,,,,
The AAAs can be any length and any number/character. Those are the first four columns. The next part is
some text and a comma here, and there, test
This string can contain zero or multiple commas. Let's consider this as the fifth column, although technically it isn't right now. The rest is just always 10 commas:
,,,,,,,,,,
The goal is to only remove the commas inside the fifth column and return the whole line back. So from this:
AAA,AAA,AAA,AAA,some text and a comma here, and there, test,,,,,,,,,,
to this, notice the two removed commas:
AAA,AAA,AAA,AAA,some text and a comma here and there test,,,,,,,,,,
Here's how I did it in two steps.
First I get the fifth column using the first capture group with this regex:
(?:.*?,){4}(.*),{10}
Then I just use:
,
to match all commas and replace them with empty strings.
My guess is that you'd need to use lookahead and lookbehind and I tried a lot of variations but I was not able to find any solution.
Is there a way to achieve this in one single regex?
Thank you for reading.
Upvotes: 1
Views: 47
Reputation: 22817
The following regex will work for PCRE:
(?:^(?:[^,]+,){4}|\G(?!\A))[^,]+\K,(?!,{9}$)
How it works:
(?:^(?:[^,]+,){4}|\G(?!\A))
match either of the following options
^(?:[^,]+,){4}
starting from the beginning of the line, match any non-comma character one or more times, then ,
; match this series exactly 4 times\G(?!\A))
assert the position at the end of the previous match[^,]+
match any character except ,
one or more times\K
reset the starting point of the match; any previously consumed characters are no longer included in the final match,
match this character literally(?!,{9}$)
negative lookahead ensuring what follows is not 9 commas and the end of the line (this is to prevent the first of ten commas from being replaced)Replace all won't work for every iteration of ,
in a line when completing this in Notepad++, but it'll still work. Just keep clicking Replace All until you see the message Replace All: 0 occurrences were replaced.
Upvotes: 1