Amanda
Amanda

Reputation: 12747

Why is csvkit giving me "List Index out of Range" errors?

I'm working with a zipcode dataset and csvkit, but getting nowhere. If I do csvcut -n zipcode.csv I see a clean list of columns:

  1: zip
  2: city
  3: state
  4: latitude
  5: longitude
  6: timezone
  7: dst

But any searches I do with csvgrep just give me an error. Here's a chunk of data:

"99919","Thorne Bay","AK","55.677232","-132.55624","-9","1"
"99921","Craig","AK","55.456449","-133.02648","-9","1"
"99922","Hydaburg","AK","55.209339","-132.82545","-9","1"
"99923","Hyder","AK","55.941442","-130.0545","-9","1"
"99925","Klawock","AK","55.555164","-133.07316","-9","1"
"99926","Metlakatla","AK","55.123897","-131.56883","-9","1"
"99927","Point Baker","AK","56.337957","-133.60689","-9","1"
"99928","Ward Cove","AK","55.395359","-131.67537","-9","1"
"99929","Wrangell","AK","56.409507","-132.33822","-9","1"
"99950","Ketchikan","AK","55.875767","-131.46633","-9","1"

Per the docs, I expect that csvgrep -c 2 -m "Hyder" zipcode.csv will turn up a match, but instead I get:

zip,city,state,latitude,longitude,timezone,dst
list index out of range

I'm able to use csvgrep fine on other csv files -- why is it choking on this one?

Upvotes: 0

Views: 991

Answers (2)

sal
sal

Reputation: 1239

in order to prevent most errors like the one described, I am using csvclean (also from csvkit) to find and correct corrupted data in the source csv. Also check this blog post for a complete how-to

Upvotes: 1

Jacob Budin
Jacob Budin

Reputation: 10003

Your issue is "zipcodes.csv" is malformed; it includes blank lines. For example, line #17 is blank:

"00607","Aguas Buenas","PR","18.256995","-66.104657","-4","0"

"00609","Aibonito","PR","18.142002","-66.273278","-4","0"

The author of the document may have done this to indicate the postal code 00608 does not exist, which may be helpful in some instances, but is preventing you from using the csvkit utility.

You can use sed, which if you're using *nix-based OS, you already have installed to automatically remove the blank lines like so:

$ sed '/^$/d' zipcode.csv > zipcode2.csv

This will store the result as "zipcode2.csv". We can now use our new "fixed" postal code file:

$ csvgrep -c 2 -m "Hyder" zipcode2.csv 
zip,city,state,latitude,longitude,timezone,dst
99923,Hyder,AK,55.941442,-130.0545,-9,1

Upvotes: 1

Related Questions