Reputation: 1356
I have a .csv file:
ce3520, 3, 555420
ce3520, 3, 555561
vs5020, Manual, 554548
vs5020, Ad, 1000382766
vs5021, Manual, 554549
vs5021, Ad, 1000382773
vs5023, Manual, 554550
vs5023, Ad, 1000382793
What I need:
ce3520, 3, 555420, 3, 555561
vs5020, Manual, 554548, Ad, 1000382766
vs5021, Manual, 554549, Ad, 1000382773
vs5023, Manual, 554550, Ad, 1000382793
So basically, get code from each line (characters until first ","), compare them with next line's code, then replace with code + two groups. Like:
ID (.*?)\
ID (.*?)\
replace with:
ID \1 \2
I'm not that familiar with Regular Expression so I'm seeking help. Is that even possible?
Upvotes: 1
Views: 65
Reputation: 809
Possibility is depend on regex engine that you use. Like Python and C# has different engines so they has different syntax and set of capturing tools.
In c# bounded syntax I failed to create required expression.
Expression for python is not perfect:
(((?P<firstString>(?P<word>^.*?),.*?) {2,3}(?P<secondString>\n^(?P=word)(?P<secondStringWithoutWord>.*))))
It splits file to a matches that contains named groups 'firstString' and 'secondStringWithoutWord'. Those strings needs to be concated from code but it works
It's more easer: just take previous expression and add substitute expression
\g<firstString>\g<secondStringWithoutWord>
So the result exacly that you need:
Upvotes: 1
Reputation: 163457
You could match:
^([^,\n]+)(,.+)\n\1
Explanation
^
Start of string([^,\n]+)
Capture group 1, match 1+ chars other than a ,
or a newline(,.+)
Capture group 2, match a comma and the rest of the line\n\1
Match a newline and a backreference to group 1And replace with the 2 capture groups, often notatated as $1$2
or \1\2
See a regex demo.
Output
ce3520, 3, 555420, 3, 555561
vs5020, Manual, 554548, Ad, 1000382766
vs5021, Manual, 554549, Ad, 1000382773
vs5023, Manual, 554550, Ad, 1000382793
Upvotes: 2