strongbad
strongbad

Reputation: 143

Filtering CSV rows by specific column data

I'd like to filter a CSV file (without headers) containing hundreds of rows based on the value in column 12. Values that filter these rows contain data like "00GG", "05FT", "66DM" and 10 more.

With the code below I'm able to print rows based on one criteria:

def load_source(filename):
    with open(filename, "r") as f:
        reader = csv.reader(f, delimiter=";")
        return list(reader)

sourcecsv = load_source("data1.csv")

for row in sourcecsv:
    if row[12] == "00GG":
        print(row)

Since the filtering of data(1.csv) is important for any queries later on, I assume it would be wise to include it already in the function load_source. I tried to do a similar "for row.. if row[12]" loop like above with a list of criteria instead of one string and append it to a new list but I got an empty list whenever I tried to print(sourcecsv) afterwards. Thanks for any help.

Upvotes: 4

Views: 23525

Answers (2)

chfw
chfw

Reputation: 4592

Alternatively, you could use my library pyexcel under 100kb footprint:

>>> import pyexcel as pe
>>> sheet=pe.get_sheet(file_name="test.csv", delimiter=';')
>>> sheet
Sheet Name: test.csv
+---+---+---+---+------+----+
| 1 | 2 | 3 | 4 | 00GG | 11 |
+---+---+---+---+------+----+
| 2 | 1 | 1 | 3 | 00GG | 12 |
+---+---+---+---+------+----+
| 3 | 2 | 4 | 5 | 11   | 11 |
+---+---+---+---+------+----+
>>> # filter out the rows that does not contain '00GG'
>>> filter = pe.filters.RowValueFilter(lambda row: row[4] != '00GG')
>>> sheet.filter(filter)
>>> sheet
Sheet Name: test.csv
+---+---+---+---+------+----+
| 1 | 2 | 3 | 4 | 00GG | 11 |
+---+---+---+---+------+----+
| 2 | 1 | 1 | 3 | 00GG | 12 |
+---+---+---+---+------+----+

Upvotes: 2

markiz
markiz

Reputation: 265

You could do:

def load_source(filename):
    with open(filename, "r") as f:
        reader = csv.reader(f, delimiter=";")
        return filter(lambda x: x[12] in ("00GG", "05FT", "66DM")), list(reader))

But using pandas would probably be a better idea, it can load csv files, filter them and much more with ease.

http://pandas.pydata.org/

Upvotes: 3

Related Questions