Reputation: 99
I have a csv (comma separated file). I would like to know how to search for a pattern where the 7th and 8th field are the same using only grep (no using cut
). I have tried something like this:
grep -E '[^,]*,{6,6}' input.csv | grep '\(.*\)\(,\)\(\1$\)' | less
Unfortunately, this does not print anything. How could I get the output I need?
Upvotes: 3
Views: 6630
Reputation: 67221
if at all you are interested in awk ,it would be more simple:
awk -F, '$7==$8' your_file
or in perl:
perl -F, -ane 'if($F[6]==$F[7]){print}' your_file
Upvotes: 1
Reputation: 753775
Assuming there's nothing awkward like fields with commas in them (because if there are such fields in the first 8 fields, you can't process the files without a full CSV-cognizant tool), and that there is a 9th field (so the 7th and 8th fields are both followed by a comma) then:
grep '^\([^,]*,\)\{6\}\([^,]*,\)\2' file.csv
The first bit says 6 sequences of zero-or-more non-commas, each followed by a comma. Then there's the 7th (possibly empty) field with its trailing comma; that's followed by the same-thing-again (the \2
).
$ cat file.csv
a,b,c,d,e,f,g,g,i
a,b,c,d,e,f,g,h,i
a,b,c,d,e,f,hhh,hhh,i
,b,c,d,e,f,hhh,hhh,i
,,c,d,e,f,hhh,hhh,i
,,,d,e,f,hhh,hhh,i
,,,,e,f,hhh,hhh,i
,,,,,f,hhh,hhh,i
,,,,,,hhh,hhh,i
,,,,,,hhh,hhh,
$ grep '^\([^,]*,\)\{6\}\([^,]*,\)\2' file.csv
a,b,c,d,e,f,g,g,i
a,b,c,d,e,f,hhh,hhh,i
,b,c,d,e,f,hhh,hhh,i
,,c,d,e,f,hhh,hhh,i
,,,d,e,f,hhh,hhh,i
,,,,e,f,hhh,hhh,i
,,,,,f,hhh,hhh,i
,,,,,,hhh,hhh,i
,,,,,,hhh,hhh,
$
Note that the g,h,i
line does not appear in the output (and it shouldn't); the rest should and do appear.
All of this is done using POSIX Basic Regular Expressions or BREs. If you use egrep
or grep -E
, you have Extended Regular Expressions or EREs at your disposal and you can forego all the backslashes except the \2
; you could also deal with a file that has some lines with 8 fields and other lines with 9 or more, but that isn't a regular CSV file. The BRE version can also be modified to work with a CSV file that has precisely 8 columns:
grep '^\([^,]*,\)\{6\}\([^,]*\),\2$' file.csv
Part of the art of using regular expressions is having a flexible mindset about different ways to achieve a given result; there is often more than one way to do it.
Upvotes: 3