Reputation: 31
I’m trying to find commas followed by space(s) and a string containing and underscore and replace with new line followed by my matched string.
Input
ABC, ZYZ John_Doe
HBB Dan_Doe
HHH, BBB, CCC April_May
Desired output
ABC John_Doe
ZYZ John_Doe
HBB Dan_Doe
HHH April_May
BBB April_May
CCC April_May
I'm using Notepad++ and RegEx, I'm able to replace comma and space by putting ,\s+
in Find and \n
in replace, but stuck on also matching a string containing underscore in that line and having it end up after new line.
any help is much appreciated
Upvotes: 3
Views: 3947
Reputation: 8043
Use the following regex N times, where N is the maximum count of commas , in the same line:
, ([A-Z]*)( \w*)
Replace with:
$2\r\n$1$2
Here is an useful site for testing.
Example:
Starting from the input text you posted, you'll have to use the regex twice:
ABC, ZYZ John_Doe
HBB Dan_Doe
HHH, BBB, CCC April_May
This will be the result of the first execution:
ABC John_Doe
ZYZ John_Doe
HBB Dan_Doe
HHH, BBB April_May
CCC April_May
And this will be the result of the second:
ABC John_Doe
ZYZ John_Doe
HBB Dan_Doe
HHH April_May
BBB April_May
CCC April_May
Upvotes: 1
Reputation: 8413
Note sure if this is the best approach, but I came up with the following pattern: (?:(?:^|\G(?!^)\h*,\h*)([[:alnum:]]+\b)(?=(\h*,\h*[[:alnum:]]+)*\h+([[:alnum:]]+_[[:alnum:]]+\h*$))|(\h+[[:alnum:]]+_[[:alnum:]]+\h*$\R?))
, replacing it with (?{1}$1 $3\n(?{2}~~:):)
and a second replace with ^~~
and replacing with an empty string.
It converts as you desired, working for an arbitrary number of commas. Here's what it does:
(?:^|\G(?!^)\h*,\h*)([[:alnum:]]+\b)(?=(\h*,\h*[[:alnum:]]+)*\h+([[:alnum:]]+_[[:alnum:]]+)\h*$)
This matches the comma separated strings (and the underscored value in a lookahead)
(?:^|\G(?!^)\h*,\h*)
matches the start of the line, or the previous match followed by horizontal spaces comma horizontal spaces([[:alnum:]]+\b)
matches letters/numbers followed by a word boundary, stores into capturing group 1(?=
starts a lookahead, so we don't actually match, just assert and store into capturing groups(\h*,\h*[[:alnum:]]+)*
matches the next words, if there are any following and stores the last into capturing group 2, if there is no following word, capturing group 2 is not matched\h+([[:alnum:]]+_[[:alnum:]]+)\h*$
matches the underscored word, captures it into group 3, there are horizontal spaces before and might be after(\h+[[:alnum:]]+_[[:alnum:]]+\h*$\R?))
This matches the underscored word and optionally the following newline, so we can replace it with an empty string.
The replacement (?{1}$1 $3\n(?{2}~~:):)
checks if the first capturing group is matched (thus one of the comma separated words). If so it inserts this word a space, the underscored word, a newline and ~~
if it's not the last one. I needed ~~
to make the \G
work correctly in all cases, you could use any string that's unlikely to reappear in the content. If the first capturing group is not matched, the replacement will be empty. The second replace with ^~~
is used to finally remove it.
Upvotes: 1
Reputation: 3055
This regex will work when you have one comma
Find:
^([A-Z]{3}), ([A-Z]{3}) ([A-Z][a-z].*)
Replace with: \1 \3\n\2 \3
This will work with two commas
Find: ^([A-Z]{3}), ([A-Z]{3}), ([A-Z]{3}) ([A-Z][a-z].*)
Replace with: \1 \4\n\2 \4\n\3 \4
Upvotes: 2