melev
melev

Reputation: 39

Validating specific column in grep

Ok this is driving me crazy. I have a text file with the following content:

"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
"1","2","3","4","text","2020-12-12","2020-04-11","21"
"1","2","3","4","text","2020-05-21","2020-03-23","453"

etc.

I want to filter lines on which the second date is in december, I tried things like:

grep '.*(\d{4}-\d{2}-\d{2}).*(2020-12-).*' > output.txt
grep '.*\d{4}-\d{2}-\d{2}.*2020-12-.*' > output.txt
grep -P '.*\d{4}-\d{2}-\d{2}.*2020-12-.*' > output.txt

But nothing seems to work. Is there any way to accomplish this with either grep, egrep, sed or awk?

Upvotes: 0

Views: 162

Answers (4)

anubhava
anubhava

Reputation: 786146

I suggest an alternate solution awk due to input data structured in rows and columns using a common delimiter:

awk -F, '$7 ~ /-12-/' file

"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"

Upvotes: 2

RavinderSingh13
RavinderSingh13

Reputation: 133770

You need to use -P option of grep to enable perl compatible regular expressions, could you please try following. Written and tested with your shown samples.

grep -P '("\d+",){4}"[a-zA-Z]+","2020-12-\d{2}"' Input_file

Explanation: Adding explanation for above, following is only for explanation purposes.

grep             ##Starting grep command from here.
-P               ##Mentioning -P option for enabling PCRE regex with grep.
'("\d+",){4}     ##Looking for " digits " comma this combination 4 times here.
"[a-zA-Z]+",     ##Then looking for " alphabets ", with this one.
"2020-12-\d{2}"  ##Then looking for " 2020-12-07 date " which OP needs.
' Input_file     ##Mentioning Input_file name here.

Upvotes: 3

Ken Y-N
Ken Y-N

Reputation: 15018

The problem is in:

egrep '.*\d{4}-\d{2}-\d{2}.2020-12-.' > output.txt
                          ^ HERE

The . just matches a single character, but you want to skip ",", so change to:

egrep '.*\d{4}-\d{2}-\d{2}.+2020-12-.' > output.txt
                          ^^ HERE

The . becomes a .+.

Upvotes: -1

Peter Thoeny
Peter Thoeny

Reputation: 7616

Use either grep -P or egrep for short:

$ cat test.txt
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
"1","2","3","4","text","2020-12-12","2020-04-11","21"
"1","2","3","4","text","2020-05-21","2020-03-23","453"
$
$ grep -P '^"([^"]*","){6}2020-12-' test.txt
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
$
$ egrep '^"([^"]*","){6}2020-12-' test.txt
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"

Explanation:

  • ^" - expect a " to start
  • ([^"]*","){6} - scan over all chars other than ", followed by ","; repeat that 6 times
  • 2020-12- - expect 202012-

Upvotes: 1

Related Questions