mmkb
mmkb

Reputation: 47

Cumulative count of different strings in a column based on value of another column

I've got a df that looks like this with duplicate ID's

     ID    Usage_type
0     5    Note
1     6    Note
2     7    Service
3     5    Note
4     7    Note
5     10   Service

I want an extra two columns that indicate the cumulative count of usage_type for each ID like so:

     ID    Usage_type   type_Note    type_Service
0     5    Note         1            0
1     6    Note         1            0
2     7    Service      0            1
3     5    Note         2            0
4     7    Note         1            1
5     10   Service      0            1

I've used cumulative count to get the total count of Usage_type for each ID but want to break it down further into separate counts for each string.

Screenshot below shows what the current counts for an example ID enter image description here

Upvotes: 1

Views: 146

Answers (2)

Anthony1223
Anthony1223

Reputation: 359

I found an answer to your problem, and solved it by creating a custom function here's the bit of code that worked for me:

def truefalse(df):
    count=0
    df_2=pd.DataFrame({'col':[0]*(len(df))})
    for i in range(len(df)):
        if df[i] == True:
            df_2['col'][i]=count+1
            count+=1
    return df_2
ID=[5,6,7,5,7,10]
usg=['Note','Note','Service','Note','Note','Service']
df=pd.DataFrame({'ID':ID,'Usage_type':usg})
for unid in df['ID'].unique():
     df['type_Note'] = truefalse(((df['Usage_type'] =='Note') & ( df['ID'] ==unid)))['col']+df['type_Note']
     df['type_Service'] = truefalse(((df['Usage_type'] =='Service) & ( df['ID'] ==unid)))['col']+df['type_Note'] 

Hope this helps!

Upvotes: 0

sophocles
sophocles

Reputation: 13821

You could filter your dataframe on the different Usage_type with loc and perform a grouped cumulative count. Lastly, a grouped ffill() will account for the cases you explained to me in the comments:

grp_cc = df.groupby([*df]).cumcount()+1 
df.loc[df.Usage_type=='Note','type_Note'] = grp_cc
df.loc[df.Usage_type=='Service','type_Service'] = grp_cc

This will get you:

   ID Usage_type  type_Note  type_Service
0   5       Note        1.0           NaN
1   6       Note        1.0           NaN
2   7    Service        NaN           1.0
3   5       Note        2.0           NaN
4   7       Note        2.0           NaN
5  10    Service        NaN           1.0

And then performing a grouped forward fill and filling in the null values get's you what you need:

df = df.groupby('ID').ffill().fillna(0)    

>>> df    
 
   Usage_type  type_Note  type_Service
0       Note        1.0           0.0
1       Note        1.0           0.0
2    Service        0.0           1.0
3       Note        2.0           0.0
4       Note        1.0           1.0
5    Service        0.0           1.0

Upvotes: 2

Related Questions