Reputation: 73
This seems pretty trivial. Generally, I'd do something like the following:
results = []
reader = csv.reader(open('file.csv'))
for line in reader: # iterate over the lines in the csv
if line[1] in ['XXX','YYY','ZZZ']: # check if the 2nd element is one you're looking for
results.append(line) # if so, add this line the the results list
However, my data set isn't so simply formatted. It looks like the following:
Symbol,Values Date
XXX,8/2/2010
XXX,8/3/2010
XXX,8/4/2010
YYY,8/2/2010
YYY,8/3/2010
YYY,8/4/2010
ZZZ,8/2/2010
ZZZ,8/3/2010
ZZZ,8/4/2010
Essentially what I am trying to do is parse the first date for each unique Symbol in the list such that I end up with the following:
XXX,8/2/2010
YYY,8/2/2010
ZZZ,8/2/2010
Upvotes: 1
Views: 64
Reputation: 14811
Pandas may help. ;-)
import pandas
pandas.read_csv('file.csv').groupby('Symbol').first()
Upvotes: 1
Reputation: 733
Here is a simple solution using a set of already found 1st element:
results = []
reader = csv.reader(open('file.csv'))
already_done = set()
for line in reader: # iterate over the lines in the csv
if line[1] in ['XXX','YYY','ZZZ'] and line[0] not in already_done:
results.append(line) # if so, add this line the the results list
already_done.add(line[0])
Upvotes: 0