Reputation: 179
I have a .txt file in that contains data about 100 colleges in the format
{COLLEGE NAME} {CITY, STATE} {RANK} {TUITION} {IN STATE TUITION} {ENROLLMENT}
For example here are two lines
YeshivaUniversity "New York, NY" 66 "$40,670 " "2,744"
FordhamUniversity "New York, NY" 60 "$47,317 " "8,855"
There are 98 more lines and the output should return all the colleges with tuition less than $30000?
Assuming that the field separator is space, how could I print the {COLLEGE NAME} {CITY, STATE} {TUITION}
of colleges with {TUITION}
less than $30,000
? Is it possible to do with awk
or sort
?
I have tried some combinations of awk
and the operators <=
, but I get an error every time. For example
$ awk -F" " '{print $1, $2, $4<=30000}' data1a.txt
gives me a syntax error.
Upvotes: 1
Views: 56
Reputation: 37404
Using GNU awk, since it's got FPAT
:
$ gawk '
BEGIN {
FPAT="([^ ]*)|(\"[^\"]+\")"
}
{
tuition=$4 # separate 4th column for cleaning
gsub(/[^0-9]/,"",tuition) # clean non-digits off
if(tuition<30000) # compare
print # and output
}'
Output for sample data:
(Next time, please post such sample that it has positive and negative cases.)
Also, it was mentioned in the comments: Delimited by single space and you have a space in name of University. That wasn't the case anymore when I saw your question but that could be tackled by counting the fields from the end, ie. $4
would be $(NF-1)
.
Upvotes: 2