Manolete
Manolete

Reputation: 3517

Counting first appearance of a field in a CSV file

Using the format of following CSV file:

Pos   ID    Name
1   0001L01 50293
2   0002L01 128864
3   0003L01 172937
4   0004L01 12878
5   0005L01 demo
6   0004L01 12878
7   0004L01 12878
8   0005L01 demo

I want to include in a dictionary: [ID], {Pos, Name, FirstTime} where FirstTime corresponds with the position that an ID first appears in the CSV file. For instance ID = 0005L01 would have: [0005L01],{5,demo,5},{8,demo,5}

I have managed to store [ID], {Pos,Name} but I am struggling with FirstTime. So far I've got:

 # From the csv reader, save it to a list
 dlist=[]
 for row in reader:
      # store only the non empty lines
      if any(row):
         dlist.append(row)
d={}
for row in dlist:
    d.setdefault(row[1],[]).append([row[0],row[2]])

Upvotes: 0

Views: 58

Answers (3)

fferri
fferri

Reputation: 18940

it's easier if you compute firstTime first, and then you fill your dictionary:

# From the csv reader, save it to a list
dlist=[]
for row in reader:
    # store only the non empty lines
    if any(row):
        dlist.append(row)
firstTime={}
for row in dlist:
    if row[1] not in firstTime: firstTime[row[1]] = row[0]
d={}
for row in dlist:
    d.setdefault(row[1],[]).append([row[0],row[2],firstTime[row[1]]])

Upvotes: 1

enrico.bacis
enrico.bacis

Reputation: 31504

from collections import defaultdict

d = defaultdict(list)
first = {}

for row in reader:
    if any(row):
        pos, ID, name = row
        if ID not in first:
            first[ID] = pos
        d[ID].append(pos, name, first[ID])

Upvotes: 1

fixxxer
fixxxer

Reputation: 16144

If you are okay to use Pandas, try this:

In [269]: temp
Out[269]: 
   Pos       ID    Name
0    1  0001L01   50293
1    2  0002L01  128864
2    3  0003L01  172937
3    4  0004L01   12878
4    5  0005L01    demo
5    6  0004L01   12878
6    7  0004L01   12878
7    8  0005L01    demo

Next, group by ID and apply min:

In [271]: temp.groupby('ID').min().rename(columns={'Pos':'Firsttime'})
Out[271]: 
         Firsttime    Name
ID                        
0001L01          1   50293
0002L01          2  128864
0003L01          3  172937
0004L01          4   12878
0005L01          5    demo

In [272]: y = temp.groupby('ID').min().rename(columns={'Pos':'Firsttime'})

Now, merge with the original dataframe:

In [276]: temp.merge(y)
Out[276]: 
   Pos       ID    Name  Firsttime
0    1  0001L01   50293          1
1    2  0002L01  128864          2
2    3  0003L01  172937          3
3    4  0004L01   12878          4
4    6  0004L01   12878          4
5    7  0004L01   12878          4
6    5  0005L01    demo          5
7    8  0005L01    demo          5

Now, iterate and save it into the dictionary:

In [280]: temp.merge(y).iterrows().next()
Out[280]: 
(0, Pos                1
 ID           0001L01
 Name           50293
 Firsttime          1
 Name: 0, dtype: object)

Upvotes: 1

Related Questions