user2578013
user2578013

Reputation: 73

Parsing unique values in a CSV where the primary key is not unique

This seems pretty trivial. Generally, I'd do something like the following:

results = []
reader = csv.reader(open('file.csv'))
for line in reader:  # iterate over the lines in the csv
    if line[1] in ['XXX','YYY','ZZZ']:  # check if the 2nd element is one you're looking for
        results.append(line)    # if so, add this line the the results list

However, my data set isn't so simply formatted. It looks like the following:

Symbol,Values Date
XXX,8/2/2010
XXX,8/3/2010
XXX,8/4/2010
YYY,8/2/2010
YYY,8/3/2010
YYY,8/4/2010
ZZZ,8/2/2010
ZZZ,8/3/2010
ZZZ,8/4/2010

Essentially what I am trying to do is parse the first date for each unique Symbol in the list such that I end up with the following:

XXX,8/2/2010
YYY,8/2/2010
ZZZ,8/2/2010

Upvotes: 1

Views: 64

Answers (2)

Peque
Peque

Reputation: 14811

Pandas may help. ;-)

import pandas
pandas.read_csv('file.csv').groupby('Symbol').first()

Upvotes: 1

user890739
user890739

Reputation: 733

Here is a simple solution using a set of already found 1st element:

results = []
reader = csv.reader(open('file.csv'))
already_done = set()
for line in reader:  # iterate over the lines in the csv
    if line[1] in ['XXX','YYY','ZZZ'] and line[0] not in already_done:
        results.append(line)    # if so, add this line the the results list
        already_done.add(line[0])

Upvotes: 0

Related Questions