Reputation: 143
I'd like to filter a CSV file (without headers) containing hundreds of rows based on the value in column 12. Values that filter these rows contain data like "00GG", "05FT", "66DM" and 10 more.
With the code below I'm able to print rows based on one criteria:
def load_source(filename):
with open(filename, "r") as f:
reader = csv.reader(f, delimiter=";")
return list(reader)
sourcecsv = load_source("data1.csv")
for row in sourcecsv:
if row[12] == "00GG":
print(row)
Since the filtering of data(1.csv) is important for any queries later on, I assume it would be wise to include it already in the function load_source. I tried to do a similar "for row.. if row[12]" loop like above with a list of criteria instead of one string and append it to a new list but I got an empty list whenever I tried to print(sourcecsv) afterwards. Thanks for any help.
Upvotes: 4
Views: 23525
Reputation: 4592
Alternatively, you could use my library pyexcel under 100kb footprint:
>>> import pyexcel as pe
>>> sheet=pe.get_sheet(file_name="test.csv", delimiter=';')
>>> sheet
Sheet Name: test.csv
+---+---+---+---+------+----+
| 1 | 2 | 3 | 4 | 00GG | 11 |
+---+---+---+---+------+----+
| 2 | 1 | 1 | 3 | 00GG | 12 |
+---+---+---+---+------+----+
| 3 | 2 | 4 | 5 | 11 | 11 |
+---+---+---+---+------+----+
>>> # filter out the rows that does not contain '00GG'
>>> filter = pe.filters.RowValueFilter(lambda row: row[4] != '00GG')
>>> sheet.filter(filter)
>>> sheet
Sheet Name: test.csv
+---+---+---+---+------+----+
| 1 | 2 | 3 | 4 | 00GG | 11 |
+---+---+---+---+------+----+
| 2 | 1 | 1 | 3 | 00GG | 12 |
+---+---+---+---+------+----+
Upvotes: 2
Reputation: 265
You could do:
def load_source(filename):
with open(filename, "r") as f:
reader = csv.reader(f, delimiter=";")
return filter(lambda x: x[12] in ("00GG", "05FT", "66DM")), list(reader))
But using pandas would probably be a better idea, it can load csv files, filter them and much more with ease.
Upvotes: 3