freewheeler
freewheeler

Reputation: 1356

RegEx - remove line break if lines start with same string

I have a .csv file:

ce3520, 3, 555420   
ce3520, 3, 555561   
vs5020, Manual, 554548  
vs5020, Ad, 1000382766  
vs5021, Manual, 554549  
vs5021, Ad, 1000382773  
vs5023, Manual, 554550  
vs5023, Ad, 1000382793  

What I need:

ce3520, 3, 555420, 3, 555561   
vs5020, Manual, 554548, Ad, 1000382766  
vs5021, Manual, 554549, Ad, 1000382773  
vs5023, Manual, 554550, Ad, 1000382793  

So basically, get code from each line (characters until first ","), compare them with next line's code, then replace with code + two groups. Like:

ID (.*?)\ 
ID (.*?)\

replace with:

ID \1 \2

I'm not that familiar with Regular Expression so I'm seeking help. Is that even possible?

Upvotes: 1

Views: 65

Answers (2)

Leonid Pavlov
Leonid Pavlov

Reputation: 809

Match function

Possibility is depend on regex engine that you use. Like Python and C# has different engines so they has different syntax and set of capturing tools.

In c# bounded syntax I failed to create required expression.

Expression for python is not perfect:

(((?P<firstString>(?P<word>^.*?),.*?) {2,3}(?P<secondString>\n^(?P=word)(?P<secondStringWithoutWord>.*))))

It splits file to a matches that contains named groups 'firstString' and 'secondStringWithoutWord'. Those strings needs to be concated from code but it works

Substitution function

It's more easer: just take previous expression and add substitute expression

\g<firstString>\g<secondStringWithoutWord>

So the result exacly that you need:

enter image description here

regex 101 snippet

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163457

You could match:

^([^,\n]+)(,.+)\n\1

Explanation

  • ^ Start of string
  • ([^,\n]+) Capture group 1, match 1+ chars other than a , or a newline
  • (,.+) Capture group 2, match a comma and the rest of the line
  • \n\1 Match a newline and a backreference to group 1

And replace with the 2 capture groups, often notatated as $1$2 or \1\2

See a regex demo.

Output

ce3520, 3, 555420, 3, 555561
vs5020, Manual, 554548, Ad, 1000382766
vs5021, Manual, 554549, Ad, 1000382773
vs5023, Manual, 554550, Ad, 1000382793

Upvotes: 2

Related Questions