maxutil
maxutil

Reputation: 195

Pandas - groupby and count series string over column

I have a df like this:

import pandas as pd

df = pd.DataFrame(columns=['Concat','SearchTerm'])
df = df.append({'Concat':'abc','SearchTerm':'aa'}, ignore_index=True)
df = df.append({'Concat':'abc','SearchTerm':'aab'}, ignore_index=True)
df = df.append({'Concat':'abc','SearchTerm':'aac'}, ignore_index=True)
df = df.append({'Concat':'abc','SearchTerm':'ddd'}, ignore_index=True)
df = df.append({'Concat':'def','SearchTerm':'cef'}, ignore_index=True)
df = df.append({'Concat':'def','SearchTerm':'plo'}, ignore_index=True)
df = df.append({'Concat':'def','SearchTerm':'cefa'}, ignore_index=True)

print(df)
  Concat SearchTerm
0    abc         aa
1    abc        aab
2    abc        aac
3    abc        ddd
4    def        cef
5    def        plo
6    def       cefa

I want to group up the df by Concat, and count how many times each SearchTerm appears within the strings of that subset. So the final result should look like this:

  Concat SearchTerm Count
0    abc         aa     3
1    abc        aab     1
2    abc        aac     1
3    abc        ddd     1
4    def        cef     2
5    def        plo     1
6    def       cefa     1

For Concat abc, aa is found 3 times among the 4 SearchTerms. I can get the solution using a loop, but for my larger dataset, it is too slow.

I have tried two solutions from this thread and this thread.

df['Count'] = df['SearchTerm'].str.contains(df['SearchTerm']).groupby(df['Concat']).sum()
df['Count'] = df.groupby(['Concat'])['SearchTerm'].transform(lambda x: x[x.str.contains(x)].count())

In either case, there is a TypeError:

'Series' objects are mutable, thus they cannot be hashed

Any help would be appreciated.

Upvotes: 1

Views: 328

Answers (1)

Andy L.
Andy L.

Reputation: 25259

Use transform and listcomp

s = df.groupby('Concat').SearchTerm.transform('|'.join)
df['Count'] = [s[i].count(term) for i, term in enumerate(df.SearchTerm)]

Out[77]:
  Concat SearchTerm  Count
0    abc         aa      3
1    abc        aab      1
2    abc        aac      1
3    abc        ddd      1
4    def        cef      2
5    def        plo      1
6    def       cefa      1

Upvotes: 3

Related Questions