Reputation: 47

Cumulative count of different strings in a column based on value of another column

I've got a df that looks like this with duplicate ID's

     ID    Usage_type
0     5    Note
1     6    Note
2     7    Service
3     5    Note
4     7    Note
5     10   Service

I want an extra two columns that indicate the cumulative count of usage_type for each ID like so:

     ID    Usage_type   type_Note    type_Service
0     5    Note         1            0
1     6    Note         1            0
2     7    Service      0            1
3     5    Note         2            0
4     7    Note         1            1
5     10   Service      0            1

I've used cumulative count to get the total count of Usage_type for each ID but want to break it down further into separate counts for each string.

Screenshot below shows what the current counts for an example ID

Upvotes: 1

Answers (2)

Anthony1223

Reputation: 359

I found an answer to your problem, and solved it by creating a custom function here's the bit of code that worked for me:

def truefalse(df):
    count=0
    df_2=pd.DataFrame({'col':[0]*(len(df))})
    for i in range(len(df)):
        if df[i] == True:
            df_2['col'][i]=count+1
            count+=1
    return df_2
ID=[5,6,7,5,7,10]
usg=['Note','Note','Service','Note','Note','Service']
df=pd.DataFrame({'ID':ID,'Usage_type':usg})
for unid in df['ID'].unique():
     df['type_Note'] = truefalse(((df['Usage_type'] =='Note') & ( df['ID'] ==unid)))['col']+df['type_Note']
     df['type_Service'] = truefalse(((df['Usage_type'] =='Service) & ( df['ID'] ==unid)))['col']+df['type_Note']

Hope this helps!

Upvotes: 0

sophocles

Reputation: 13821

You could filter your dataframe on the different Usage_type with loc and perform a grouped cumulative count. Lastly, a grouped ffill() will account for the cases you explained to me in the comments:

grp_cc = df.groupby([*df]).cumcount()+1 
df.loc[df.Usage_type=='Note','type_Note'] = grp_cc
df.loc[df.Usage_type=='Service','type_Service'] = grp_cc

This will get you:

   ID Usage_type  type_Note  type_Service
0   5       Note        1.0           NaN
1   6       Note        1.0           NaN
2   7    Service        NaN           1.0
3   5       Note        2.0           NaN
4   7       Note        2.0           NaN
5  10    Service        NaN           1.0

And then performing a grouped forward fill and filling in the null values get's you what you need:

df = df.groupby('ID').ffill().fillna(0)    

>>> df    
 
   Usage_type  type_Note  type_Service
0       Note        1.0           0.0
1       Note        1.0           0.0
2    Service        0.0           1.0
3       Note        2.0           0.0
4       Note        1.0           1.0
5    Service        0.0           1.0

Upvotes: 2

Cumulative count of different strings in a column based on value of another column

Answers (2)

Related Questions