Reputation: 21
I really want to read the following csv file: ID; First Name; Last Name; Phone; 123; Max; Smith; 0193849843 124; John; Doe; 0012943843
..and extract it into the following Format:
[OrderedDict([('ID', '123'), ('Last Name', 'Smith')]), OrderedDict([('ID', '124'), ("Last Name", "Doe")])]
However, with my Code displayed below, im only able to get the OrderedDict with all keys inside. How is it possible to only Access certain columns within the csv file? I Need the exact Output in order to later transform the Code into JSON.
import csv
csvfilepath = r"csvpath"
jsonfilepath = r"jsonpath"
data = []
with open(csvfilepath) as csvfile:
csvReader = csv.DictReader(csvfile,delimiter=";")
for csvRow in csvReader:
ID = csvRow["ID"]
data.append(csvRow)
Thanks a lot! Jonas
Upvotes: 1
Views: 4055
Reputation: 81654
The short answer is yes, you can read specific columns (but with a caveat). However it's going to be much simpler if you just read all the columns and then build a dictionary from the columns that you need. It's much simpler and might even perform better.
You can use fieldnames
argument to explictly define the columns you are interested in. The caveat is that the other columns will still be present in the dictionary under the None
key (unless you provide another key with the restkey
argument).
From the docs:
The
fieldnames
parameter is a sequence. Iffieldnames
is omitted, the values in the first row of file f will be used as thefieldnames
. Regardless of how thefieldnames
are determined, the ordered dictionary preserves their original ordering.If a row has more fields than
fieldnames
, the remaining data is put in a list and stored with the fieldname specified byrestkey
(which defaults to None). If a non-blank row has fewer fields thanfieldnames
, the missing values are filled-in withNone
.
You can use fieldnames
to specify the columns you want and then use .pop
to remove the None
key (and its values).
Consider the following file:
header1,header2
a,b
c,d
e,f
Then:
with open('test.csv') as csvfile:
csvReader = csv.DictReader(csvfile, fieldnames=['header1'])
print([row for row in csvReader])
# [OrderedDict([('header1', 'header1'), (None, ['header2'])]),
# OrderedDict([('header1', 'a'), (None, ['b'])]),
# OrderedDict([('header1', 'c'), (None, ['d'])]),
# OrderedDict([('header1', 'e'), (None, ['f'])])]
If we pop the None
key:
csvReader = list(csvReader)
[row.pop(None) for row in csvReader]
# yes, abusing list comprehension for a side effect for sake of a simple example.
# Don't do that in production code
print([row for row in csvReader])
# [OrderedDict([('header1', 'header1')]), OrderedDict([('header1', 'a')]),
# OrderedDict([('header1', 'c')]), OrderedDict([('header1', 'e')])]
Upvotes: 1