Reputation: 31
The Question is like Find those names who have got number greater than equal to m but less than n. A ".csv" file is given. It is preferable to solve this using grep (regex) .
I am going like this:
cat abc.csv|cut -f 3,7 -d ","|grep "4[4-9][0-9]*"|head
But it is giving me other than desired
NOTE column 3 is person's name and column 7 is the corresponding number of those people.
Any suggestion to solve this will be very helpful.
Upvotes: 2
Views: 1267
Reputation: 203324
Some people, when confronted with a problem, think "I know,
I'll use regular expressions." Now they have two problems.
(see https://blog.codinghorror.com/regular-expressions-now-you-have-two-problems/ for reference).
This isn't a good example of how to use grep
as it's well documented that using a regexp to do a numeric comparison is a far more difficult and fragile approach than just comparing numbers, e.g. with awk
, and that using grep
on a line when your data is in a specific field is also more difficult and fragile than using a tool that understands fields, e.g. awk
again.
The right way to test for the contents of a field being in a numeric range is to do a numeric comparison on just that field:
awk -F, '(440<=$7) && ($7<500){print $3}' abc.csv
I'm guessing at the values you want the range to have based on the regexp you tried in your question, if I guessed wrong just change them.
I see from some other answers that you do not want to print lines where $7
contains a .
or maybe it's that you only want lines where $7
is an integer. If so then that's a trivial and appropriate thing to use a regexp to test for:
awk -F, '($7 !~ /\./) && (440<=$7) && ($7<500){print $3}' abc.csv
or:
awk -F, '($7 ~ /^[0-9]+$/) && (440<=$7) && ($7<500){print $3}' abc.csv
Hopefully you can see how clear, simple, robust and easy to modify in future that is vs trying to do the same with a regexp across a line using grep.
Upvotes: 2
Reputation: 163277
Using a pattern printing the values from column 3 where column 7 is in the range of 400-499 with only awk
instead of piping through multiple programs.
The pattern ^4[0-9][0-9]$
uses anchors ^
and $
to prevent partial matches and 2 ranges 0-9 to match 400 to 499.
awk -F, '
$7 ~ /^4[0-9][0-9]$/ {
print $3
}
' abc.csv
If you can use gnu grep
, you can match the value of the 3rd field, if the 7th field in in range 400-499, but it is a long pattern and I would recommend using awk.
^(?:[^,]*,){2}\K[^,\n]+(?=(?:,[^,\n]*){3},\s*4[0-9][0-9](?=\s*,|$))
^
Start of string(?:[^,]*,){2}
Match the first 2 comma separated fields\K
Forget what is matched so far[^,]+
Match the 3rd field(?=
Positive lookahead assertion
(?:,[^,\n]*){3},\s*4[0-9][0-9](?=\s*,|$)
Match the 7th field to be a value 400-499 between followed by either a comma or the end of the string to prevent a partial match)
Close lookaheadSee a regex demo
For example
grep -oP "^(?:[^,]*,){2}\K[^,]+(?=(?:,[^,]*){3},\s*4[0-9][0-9](?=\s*,|$))" abc.csv
Upvotes: 1
Reputation: 337
If you need only the name then you have to add:
cut -f 1 -d ","
If you need only real numers between 400.00 and 499.99 (as I see from your result) then grep should be:
grep "4[0-9][0-9]\.[0-9][0-9]"
If you need to admit any number of decimals and also integers and take care of optional trailing spaces and end of line($) you can use:
grep -E "4[0-9][0-9](\.[0-9][0-9]*)* *$"
If you need to be sure it does not match 1400 or names that contains 400 then you should use:
grep -E " *, *4[0-9][0-9](\.[0-9][0-9]*)* *$"
We can go on, but I will stop here. My proposal is to use this:
cat Bulk.csv | cut -f 3,7 -d "," | grep -E " *, *4[0-9][0-9](\.[0-9][0-9]*)* *$" | cut -f 1 -d ","
Upvotes: 0
Reputation: 6061
Try:
cut -d, -f 3,7 Bulk.csv | grep ',4[0-9][0-9][^0-9]' | cut -d, -f 1
Explanation: cat
is not necessary. The expression [^0-9]
means everything except a digit; using only ,4[0-9][0-9]
as regex would select also lines containing numbers with more digits before the decimal point, like 4247.14
, which is not what you want.
We miss a sample of your input file Bulk.csv
to reproduce your problem.
Upvotes: 1