Reputation: 39
Ok this is driving me crazy. I have a text file with the following content:
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
"1","2","3","4","text","2020-12-12","2020-04-11","21"
"1","2","3","4","text","2020-05-21","2020-03-23","453"
etc.
I want to filter lines on which the second date is in december, I tried things like:
grep '.*(\d{4}-\d{2}-\d{2}).*(2020-12-).*' > output.txt
grep '.*\d{4}-\d{2}-\d{2}.*2020-12-.*' > output.txt
grep -P '.*\d{4}-\d{2}-\d{2}.*2020-12-.*' > output.txt
But nothing seems to work. Is there any way to accomplish this with either grep, egrep, sed or awk?
Upvotes: 0
Views: 162
Reputation: 786146
I suggest an alternate solution awk
due to input data structured in rows and columns using a common delimiter:
awk -F, '$7 ~ /-12-/' file
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
Upvotes: 2
Reputation: 133770
You need to use -P
option of grep
to enable perl compatible regular expressions, could you please try following. Written and tested with your shown samples.
grep -P '("\d+",){4}"[a-zA-Z]+","2020-12-\d{2}"' Input_file
Explanation: Adding explanation for above, following is only for explanation purposes.
grep ##Starting grep command from here.
-P ##Mentioning -P option for enabling PCRE regex with grep.
'("\d+",){4} ##Looking for " digits " comma this combination 4 times here.
"[a-zA-Z]+", ##Then looking for " alphabets ", with this one.
"2020-12-\d{2}" ##Then looking for " 2020-12-07 date " which OP needs.
' Input_file ##Mentioning Input_file name here.
Upvotes: 3
Reputation: 15018
The problem is in:
egrep '.*\d{4}-\d{2}-\d{2}.2020-12-.' > output.txt
^ HERE
The .
just matches a single character, but you want to skip ","
, so change to:
egrep '.*\d{4}-\d{2}-\d{2}.+2020-12-.' > output.txt
^^ HERE
The .
becomes a .+
.
Upvotes: -1
Reputation: 7616
Use either grep -P
or egrep
for short:
$ cat test.txt
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
"1","2","3","4","text","2020-12-12","2020-04-11","21"
"1","2","3","4","text","2020-05-21","2020-03-23","453"
$
$ grep -P '^"([^"]*","){6}2020-12-' test.txt
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
$
$ egrep '^"([^"]*","){6}2020-12-' test.txt
"1","2","3","4","text","2020-01-01","2020-12-13","4"
"1","2","3","4","text","2020-12-07","2020-12-03","22"
Explanation:
^"
- expect a "
to start([^"]*","){6}
- scan over all chars other than "
, followed by ","
; repeat that 6 times2020-12-
- expect 202012-
Upvotes: 1