Reputation: 1387
I have data which looks like this:
Group string
A Hello
A SearchListing
A GoSearch
A pen
A Hello
B Real-Estate
B Access
B Denied
B Group
B Group
C Glance
C NoSearch
C Home
and so on
I want to find out all those group who have "search" phrase in the strings and mark them as 0/1. At the same time I want to aggregate results like unique strings and total strings with respect to each group and also, how many times "search" was encountered by that group. The end results which I want is something like this:
Group containsSearch TotalStrings UniqueStrings NoOfTimesSearch
A 1 5 4 2
B 0 5 4 0
C 1 3 3 1
I can aggregate using a simple groupby clause, but I am having problems on how to mark the group as 0/1 based on the presence of "search" and counting how many times it was encountered.
Upvotes: 0
Views: 92
Reputation: 153500
Let's try:
l1 = lambda x: x.str.lower().str.contains('search').any().astype(int)
l1.__name__ = 'containsSearch'
l2 = lambda x: x.str.lower().str.contains('search').sum().astype(int)
l2.__name__ = 'NoOfTimesSEarch'
df.groupby('Group')['string'].agg(['count','nunique',l1,l2]).reset_index()
Output:
Group count nunique containsSearch NooOfTimesSEarch
0 A 5 4 1 2
1 B 5 4 0 0
2 C 3 3 1 1
Or using defined functions thanks, @W-B:
def conatinsSearch(x):
return x.str.lower().str.contains('search').any().astype(int)
def NoOfTimesSearch(x):
return x.str.lower().str.contains('search').sum().astype(int)
df.groupby('Group')['string'].agg(['count', 'nunique',
conatinsSearch, NoOfTimesSearch]).reset_index()
Output:
Group count nunique conatinsSearch NoOfTimesSearch
0 A 5 4 1 2
1 B 5 4 0 0
2 C 3 3 1 1
Upvotes: 4
Reputation: 14113
If you want to create a function:
def my_agg(x):
names = {
'containsSearch' : int(x['string'].str.lower().str.contains('search').any()),
'TotalStrings' : x['string'].count(),
'UniqueStrings' : x['string'].drop_duplicates().count(),
'NoOfTimesSearch' : int(x[x['string'].str.lower().str.contains('search')].count())
}
return pd.Series(names)
df.groupby('Group').apply(my_agg)
containsSearch TotalStrings UniqueStrings NoOfTimesSearch
Group
A 1 5 4 2
B 0 5 4 0
C 1 3 3 1
Upvotes: 1