Reputation: 11
I'm currently learning grep
as well as regex / other shell tools. The input txt file is of a library that contains authors, their books, and editions
ISBN Title Edition Author Year
8298 foo 3 Charles 1999
I need to use grep
to find the total number of books that were published between a certain time period (example, 1993-2008).
I have tried:
grep -E ', (19(7[5-9]|[89][0-9])|200[0-5])$'
This didn't produce any result.
I also tried a regex that I came up with, which does produce the right number, but the regex isn't correct.
\s(197[5-9]|198[0-9]|199[0-9]|200[0-5])
Sorry for the vagueness in the question. Also, I'm running on WSL if that makes a difference.
Upvotes: -1
Views: 95
Reputation: 189789
In the general case, to grep
a specific field in comma-separated input, specify how many comma-separated fields to skip before the match.
grep -E '^([^,]*,){4}[[:space:]]*(19(7[5-9]|89][0-9]|200[0-5])$' file.csv
The expression [^,]*,
matches one field and the comma after it, i.e. zero or more characters which are not comma followed by one which is. By anchoring to beginning of line ^
and specifying four repetitions of this expression which skips one field, we target the beginning of the fifth.
Some, but not all, grep
implementations allow you to generalize the final anchor to (,|$)
i.e. to look for either another comma (for lines with more than five fields) or end of line (for lines with exactly five).
In real life, CSV files can contain quoted fields which embed a literal comma, so then you need a more complex regular expression. Real-life CSV files can also contain quoted fields which span multiple lines, so then grep
(or nontrivial Awk) alone will not cut it.
(Also, real CSV files don't have spaces after the commas.)
Upvotes: 0
Reputation: 16819
Since year appears at end of line, and assuming each record is a single line:
grep -E ', (19(7[5-9]|[89][0-9])|200[0-5])$'
Upvotes: 0
Reputation: 41962
You can use this
awk -F',' '1975 <= $5 && $5 <= 2005' books.txt
-F
is used for setting the field separator, and $5
is the 5th field
Upvotes: 1
Reputation: 11
I think I've figured it out though.
\s(197[5-9]|198[0-9]|199[0-9]|200[0-5])
If anyone has a better solution, do let me know. Thanks.
Upvotes: -2