Reputation: 914

How to find the NTh entry in a comma-separated list and keep the corresponding row?

I would like to extract some rows from a large .txt file:

MYNAME, 2017-03-01, John Wayne, H\
MYNAME, 2017-01-01, Brian Wayne,P\
MYNAME, 2017-02-01, Brian Duffe, TR\
MYNAME, 2017-03-01, Iggor Miller, R\

Having the following file I would like to extract only those people whose name starts with W:

MYNAME, 2017-03-01, John Wayne,H\
MYNAME, 2017-01-01, Brian Wayne,P\

What I have tried did not work as expected:

 /(?:[^\,]*\,){2}([^,]*)/

Where I try to get a Wafter the second ,

Appreciate any suggestions!

Upvotes: 3

Answers (1)

Wiktor Stribiżew

Reputation: 627334

Your (?:[^\,]*\,){2}([^,]*) regex matches any 0+ chars other than , followed with , exactly 2 times, and then 0+ chars other than ,. Just adding W won't work, you need to account for the words before the family names. You might add \s+\S+\s+W before the last [^,]*, or use a PCRE regex:

^(?:[^,]*,){2}\h*\S+\h+W.*

See this demo.

Details

^ - start of string/line
(?:[^,]*,){2} - 2 occurrences of any 0+ chars other than , followed with ,
\h* - 0+ horizontal whitespace chars
\S+ - 1 or more non-whitespace chars
\h+ - 1+ horizontal whitespace chars
W - a W char
.* - any 0+ chars other than line break chars, as many as possible

Another alternative (a JS compatible one): match all chars other than , after you matched two chunks of non-comma chars followed with a comma, and then match a whitespace + W:

^(?:[^,]*,){2}[^,]*\sW.*

See this demo.

Here, [^,]*\sW.* matches any 0+ chars other than , as many as possible, and then a whitespace is matched, then W and then any 0+ chars other than line break chars, as many as possible (the rest of the string/line).

Upvotes: 1

How to find the NTh entry in a comma-separated list and keep the corresponding row?

Answers (1)

Related Questions