Grep Pattern Repetition

Question

I have a csv (comma separated file). I would like to know how to search for a pattern where the 7th and 8th field are the same using only grep (no using cut). I have tried something like this:

grep -E '[^,]*,{6,6}' input.csv | grep '$.*$$,$$\1$$' | less

Unfortunately, this does not print anything. How could I get the output I need?

Jonathan Leffler · Accepted Answer

Assuming there's nothing awkward like fields with commas in them (because if there are such fields in the first 8 fields, you can't process the files without a full CSV-cognizant tool), and that there is a 9th field (so the 7th and 8th fields are both followed by a comma) then:

grep '^$[^,]*,$\{6\}$[^,]*,$\2' file.csv

The first bit says 6 sequences of zero-or-more non-commas, each followed by a comma. Then there's the 7th (possibly empty) field with its trailing comma; that's followed by the same-thing-again (the \2).

$ cat file.csv
a,b,c,d,e,f,g,g,i
a,b,c,d,e,f,g,h,i
a,b,c,d,e,f,hhh,hhh,i
,b,c,d,e,f,hhh,hhh,i
,,c,d,e,f,hhh,hhh,i
,,,d,e,f,hhh,hhh,i
,,,,e,f,hhh,hhh,i
,,,,,f,hhh,hhh,i
,,,,,,hhh,hhh,i
,,,,,,hhh,hhh,
$ grep '^$[^,]*,$\{6\}$[^,]*,$\2' file.csv
a,b,c,d,e,f,g,g,i
a,b,c,d,e,f,hhh,hhh,i
,b,c,d,e,f,hhh,hhh,i
,,c,d,e,f,hhh,hhh,i
,,,d,e,f,hhh,hhh,i
,,,,e,f,hhh,hhh,i
,,,,,f,hhh,hhh,i
,,,,,,hhh,hhh,i
,,,,,,hhh,hhh,
$

Note that the g,h,i line does not appear in the output (and it shouldn't); the rest should and do appear.

All of this is done using POSIX Basic Regular Expressions or BREs. If you use egrep or grep -E, you have Extended Regular Expressions or EREs at your disposal and you can forego all the backslashes except the \2; you could also deal with a file that has some lines with 8 fields and other lines with 9 or more, but that isn't a regular CSV file. The BRE version can also be modified to work with a CSV file that has precisely 8 columns:

grep '^$[^,]*,$\{6\}$[^,]*$,\2$' file.csv

Part of the art of using regular expressions is having a flexible mindset about different ways to achieve a given result; there is often more than one way to do it.

Grep Pattern Repetition

Answers (2)

Related Questions