Dinosaurius
Dinosaurius

Reputation: 8628

How to count the number of times a sequence of columns appears in a data frame?

I have this dataframe df:

AA_0    AA_1     AA_2     AA_3
store   cake     mass     visit    
store   mass     visit
mass    store
store   cake     mass     visit

I want to calculate the number of times each that each sequence AA_0-AA_3 appears in df and represent the result as follows:

result = 

    count   data
    2       store/cake/mass/visit
    1       store/mass/visit
    1       mass/store

How can I do it?

Upvotes: 1

Views: 81

Answers (1)

jezrael
jezrael

Reputation: 862831

You can use:

df['data'] = df.apply(lambda x: '/'.join(x.dropna()), axis=1)
print (df)
    AA_0   AA_1   AA_2   AA_3                   data
0  store   cake   mass  visit  store/cake/mass/visit
1  store   mass  visit    NaN       store/mass/visit
2   mass  store    NaN    NaN             mass/store
3  store   cake   mass  visit  store/cake/mass/visit

result = df.data.value_counts().rename_axis('count').reset_index()
print (result)
                   count  data
0  store/cake/mass/visit     2
1       store/mass/visit     1
2             mass/store     1

If missing data are spaces:

df['data'] = df.apply(lambda x: '/'.join(x), axis=1).str.strip('/ ')
print (df)
    AA_0   AA_1   AA_2   AA_3                   data
0  store   cake   mass  visit  store/cake/mass/visit
1  store   mass  visit              store/mass/visit
2   mass  store                           mass/store
3  store   cake   mass  visit  store/cake/mass/visit

result = df.data.value_counts().rename_axis('count').reset_index()
print (result)
                   count  data
0  store/cake/mass/visit     2
1       store/mass/visit     1
2             mass/store     1

Upvotes: 2

Related Questions