Reputation: 8628
I have this dataframe df
:
AA_0 AA_1 AA_2 AA_3
store cake mass visit
store mass visit
mass store
store cake mass visit
I want to calculate the number of times each that each sequence AA_0
-AA_3
appears in df
and represent the result as follows:
result =
count data
2 store/cake/mass/visit
1 store/mass/visit
1 mass/store
How can I do it?
Upvotes: 1
Views: 81
Reputation: 862831
You can use:
df['data'] = df.apply(lambda x: '/'.join(x.dropna()), axis=1)
print (df)
AA_0 AA_1 AA_2 AA_3 data
0 store cake mass visit store/cake/mass/visit
1 store mass visit NaN store/mass/visit
2 mass store NaN NaN mass/store
3 store cake mass visit store/cake/mass/visit
result = df.data.value_counts().rename_axis('count').reset_index()
print (result)
count data
0 store/cake/mass/visit 2
1 store/mass/visit 1
2 mass/store 1
If missing data are spaces:
df['data'] = df.apply(lambda x: '/'.join(x), axis=1).str.strip('/ ')
print (df)
AA_0 AA_1 AA_2 AA_3 data
0 store cake mass visit store/cake/mass/visit
1 store mass visit store/mass/visit
2 mass store mass/store
3 store cake mass visit store/cake/mass/visit
result = df.data.value_counts().rename_axis('count').reset_index()
print (result)
count data
0 store/cake/mass/visit 2
1 store/mass/visit 1
2 mass/store 1
Upvotes: 2