Reputation: 12747
I'm working with a zipcode dataset and csvkit, but getting nowhere. If I do csvcut -n zipcode.csv
I see a clean list of columns:
1: zip
2: city
3: state
4: latitude
5: longitude
6: timezone
7: dst
But any searches I do with csvgrep
just give me an error. Here's a chunk of data:
"99919","Thorne Bay","AK","55.677232","-132.55624","-9","1"
"99921","Craig","AK","55.456449","-133.02648","-9","1"
"99922","Hydaburg","AK","55.209339","-132.82545","-9","1"
"99923","Hyder","AK","55.941442","-130.0545","-9","1"
"99925","Klawock","AK","55.555164","-133.07316","-9","1"
"99926","Metlakatla","AK","55.123897","-131.56883","-9","1"
"99927","Point Baker","AK","56.337957","-133.60689","-9","1"
"99928","Ward Cove","AK","55.395359","-131.67537","-9","1"
"99929","Wrangell","AK","56.409507","-132.33822","-9","1"
"99950","Ketchikan","AK","55.875767","-131.46633","-9","1"
Per the docs, I expect that csvgrep -c 2 -m "Hyder" zipcode.csv
will turn up a match, but instead I get:
zip,city,state,latitude,longitude,timezone,dst
list index out of range
I'm able to use csvgrep
fine on other csv files -- why is it choking on this one?
Upvotes: 0
Views: 991
Reputation: 1239
in order to prevent most errors like the one described, I am using csvclean (also from csvkit) to find and correct corrupted data in the source csv. Also check this blog post for a complete how-to
Upvotes: 1
Reputation: 10003
Your issue is "zipcodes.csv" is malformed; it includes blank lines. For example, line #17 is blank:
"00607","Aguas Buenas","PR","18.256995","-66.104657","-4","0"
"00609","Aibonito","PR","18.142002","-66.273278","-4","0"
The author of the document may have done this to indicate the postal code 00608 does not exist, which may be helpful in some instances, but is preventing you from using the csvkit utility.
You can use sed, which if you're using *nix-based OS, you already have installed to automatically remove the blank lines like so:
$ sed '/^$/d' zipcode.csv > zipcode2.csv
This will store the result as "zipcode2.csv". We can now use our new "fixed" postal code file:
$ csvgrep -c 2 -m "Hyder" zipcode2.csv
zip,city,state,latitude,longitude,timezone,dst
99923,Hyder,AK,55.941442,-130.0545,-9,1
Upvotes: 1