J.Weiser
J.Weiser

Reputation: 21

How to exclude columns within csv DictReader using Python?

I really want to read the following csv file: ID; First Name; Last Name; Phone; 123; Max; Smith; 0193849843 124; John; Doe; 0012943843

..and extract it into the following Format:

[OrderedDict([('ID', '123'), ('Last Name', 'Smith')]), OrderedDict([('ID', '124'), ("Last Name", "Doe")])]

However, with my Code displayed below, im only able to get the OrderedDict with all keys inside. How is it possible to only Access certain columns within the csv file? I Need the exact Output in order to later transform the Code into JSON.

import csv

csvfilepath = r"csvpath"
jsonfilepath = r"jsonpath"

data = []

with open(csvfilepath) as csvfile:
    csvReader = csv.DictReader(csvfile,delimiter=";")

    for csvRow in csvReader:
        ID = csvRow["ID"]
        data.append(csvRow) 

Thanks a lot! Jonas

Upvotes: 1

Views: 4055

Answers (1)

DeepSpace
DeepSpace

Reputation: 81654

The short answer is yes, you can read specific columns (but with a caveat). However it's going to be much simpler if you just read all the columns and then build a dictionary from the columns that you need. It's much simpler and might even perform better.


You can use fieldnames argument to explictly define the columns you are interested in. The caveat is that the other columns will still be present in the dictionary under the None key (unless you provide another key with the restkey argument).

From the docs:

The fieldnames parameter is a sequence. If fieldnames is omitted, the values in the first row of file f will be used as the fieldnames. Regardless of how the fieldnames are determined, the ordered dictionary preserves their original ordering.

If a row has more fields than fieldnames, the remaining data is put in a list and stored with the fieldname specified by restkey (which defaults to None). If a non-blank row has fewer fields than fieldnames, the missing values are filled-in with None.

You can use fieldnames to specify the columns you want and then use .pop to remove the None key (and its values).

Consider the following file:

header1,header2
a,b
c,d
e,f

Then:

with open('test.csv') as csvfile:
    csvReader = csv.DictReader(csvfile, fieldnames=['header1'])
    print([row for row in csvReader])
    # [OrderedDict([('header1', 'header1'), (None, ['header2'])]),
    #  OrderedDict([('header1', 'a'), (None, ['b'])]),
    #  OrderedDict([('header1', 'c'), (None, ['d'])]), 
    #  OrderedDict([('header1', 'e'), (None, ['f'])])]

If we pop the None key:

csvReader = list(csvReader)
[row.pop(None) for row in csvReader]
# yes, abusing list comprehension for a side effect for sake of a simple example.
# Don't do that in production code
print([row for row in csvReader])
# [OrderedDict([('header1', 'header1')]), OrderedDict([('header1', 'a')]),
#  OrderedDict([('header1', 'c')]), OrderedDict([('header1', 'e')])]

Upvotes: 1

Related Questions