pylearner
pylearner

Reputation: 1460

Count the no of times an id has occurred and assign it to a column

I have a dataset and want to extract the count of the ID repeated in that column based on other columns session time.

Dataset :

person ID   Session type ID
1001    morning      1
1001    evening      1
1002    morning      2
1002    night        1
1001    morning      2
1002    evening      1
1002    night        2

Based on my type ID == 1 I want to get count of them for every session as mentioned below

Output:

person ID   morning_count   evening_count   night_count
1001             1                1              0
1002             0                1              1

Based on my type ID ==1 it should give me count of that fro every session column.

Upvotes: 0

Views: 326

Answers (2)

Uditha
Uditha

Reputation: 11

Hope this python dictionary based alternative approach would work:

counts = {list(set(dataset['person ID']))[i]:{'morning_count': 0, 'evening_count':0, 'night_count':0} for i in range(list(dataset['person ID']))}


df_len = len(dataset)

for i in range(df_len):
   if(dataset[i]['type ID'] == 1):
      if(dataset[i]['Session'] == 'morning'):
         counts[dataset[i]['person ID']]['morning_count'] += 1
      elif(dataset[i]['Session'] == 'evening'):
         counts[dataset[i]['person ID']]['evening_count'] += 1
         evening_counts[dataset[i]['person ID']] += 1
      elif(dataset[i]['Session'] == 'night'):
         counts[dataset[i]['person ID']]['night_count'] += 1

print(counts)

Here you could print by using another loop as you require.

This would give the basic idea to get the same result by using only python dictionaries

Upvotes: 1

jezrael
jezrael

Reputation: 862851

First create new column withDataFrame.assign filled integers for matched rows, then use DataFrame.pivot_table with aggregate sum and last some data cleaning with DataFrame.add_suffix, DataFrame.reset_index and DataFrame.rename_axis:

df1 = (df.assign(count=df['type ID'].eq(1).astype(int))
         .pivot_table(index='person ID',
                      columns='Session', 
                      values='count', 
                      aggfunc='sum', 
                      fill_value=0)
         .add_suffix('_count')
         .reset_index()
         .rename_axis(None, axis=1))
print (df1)
   person ID  evening_count  morning_count  night_count
0       1001              1              1            0
1       1002              1              0            1

Upvotes: 1

Related Questions