Parsing unique values in a CSV where the primary key is not unique

Question

This seems pretty trivial. Generally, I'd do something like the following:

results = []
reader = csv.reader(open('file.csv'))
for line in reader:  # iterate over the lines in the csv
    if line[1] in ['XXX','YYY','ZZZ']:  # check if the 2nd element is one you're looking for
        results.append(line)    # if so, add this line the the results list

However, my data set isn't so simply formatted. It looks like the following:

Symbol,Values Date
XXX,8/2/2010
XXX,8/3/2010
XXX,8/4/2010
YYY,8/2/2010
YYY,8/3/2010
YYY,8/4/2010
ZZZ,8/2/2010
ZZZ,8/3/2010
ZZZ,8/4/2010

Essentially what I am trying to do is parse the first date for each unique Symbol in the list such that I end up with the following:

XXX,8/2/2010
YYY,8/2/2010
ZZZ,8/2/2010

Peque · Accepted Answer

Pandas may help. ;-)

import pandas
pandas.read_csv('file.csv').groupby('Symbol').first()

Parsing unique values in a CSV where the primary key is not unique

Answers (2)

Related Questions