user1552879
user1552879

Reputation: 99

Grep Pattern Repetition

I have a csv (comma separated file). I would like to know how to search for a pattern where the 7th and 8th field are the same using only grep (no using cut). I have tried something like this:

grep -E '[^,]*,{6,6}' input.csv | grep '\(.*\)\(,\)\(\1$\)' | less

Unfortunately, this does not print anything. How could I get the output I need?

Upvotes: 3

Views: 6630

Answers (2)

Vijay
Vijay

Reputation: 67221

if at all you are interested in awk ,it would be more simple:

awk -F, '$7==$8' your_file

or in perl:

perl -F, -ane 'if($F[6]==$F[7]){print}' your_file

Upvotes: 1

Jonathan Leffler
Jonathan Leffler

Reputation: 753775

Assuming there's nothing awkward like fields with commas in them (because if there are such fields in the first 8 fields, you can't process the files without a full CSV-cognizant tool), and that there is a 9th field (so the 7th and 8th fields are both followed by a comma) then:

grep '^\([^,]*,\)\{6\}\([^,]*,\)\2' file.csv

The first bit says 6 sequences of zero-or-more non-commas, each followed by a comma. Then there's the 7th (possibly empty) field with its trailing comma; that's followed by the same-thing-again (the \2).

$ cat file.csv
a,b,c,d,e,f,g,g,i
a,b,c,d,e,f,g,h,i
a,b,c,d,e,f,hhh,hhh,i
,b,c,d,e,f,hhh,hhh,i
,,c,d,e,f,hhh,hhh,i
,,,d,e,f,hhh,hhh,i
,,,,e,f,hhh,hhh,i
,,,,,f,hhh,hhh,i
,,,,,,hhh,hhh,i
,,,,,,hhh,hhh,
$ grep '^\([^,]*,\)\{6\}\([^,]*,\)\2' file.csv
a,b,c,d,e,f,g,g,i
a,b,c,d,e,f,hhh,hhh,i
,b,c,d,e,f,hhh,hhh,i
,,c,d,e,f,hhh,hhh,i
,,,d,e,f,hhh,hhh,i
,,,,e,f,hhh,hhh,i
,,,,,f,hhh,hhh,i
,,,,,,hhh,hhh,i
,,,,,,hhh,hhh,
$

Note that the g,h,i line does not appear in the output (and it shouldn't); the rest should and do appear.

All of this is done using POSIX Basic Regular Expressions or BREs. If you use egrep or grep -E, you have Extended Regular Expressions or EREs at your disposal and you can forego all the backslashes except the \2; you could also deal with a file that has some lines with 8 fields and other lines with 9 or more, but that isn't a regular CSV file. The BRE version can also be modified to work with a CSV file that has precisely 8 columns:

grep '^\([^,]*,\)\{6\}\([^,]*\),\2$' file.csv

Part of the art of using regular expressions is having a flexible mindset about different ways to achieve a given result; there is often more than one way to do it.

Upvotes: 3

Related Questions