Reputation: 4638
My Dataframe:
pd.DataFrame({'company':['Chipotle','Branchburg Chipotle','Chipotle NJ','Chipotle 8853','The Home Depot','Home Depot','28211 Home Depot','Wendys','BJs','Buffalo wings'],
'address':['123 Main Street Branchburg NJ 08853'
,'123 Main Street Branchburg NJ 08853'
,'123 Main Street Branchburg NJ 08853'
,'123 Main Street Branchburg NJ 08853'
,'1220 N Wendover Rd Charlotte NC 28211'
,'1220 N Wendover Rd Charlotte NC 28211'
,'1220 N Wendover Rd Charlotte NC 28211'
,'2805 Whitson St Selma CA 93662'
,'2805 Whitson St Selma CA 93662'
,'2805 Whitson St Selma CA 93662']})
company address
0 Chipotle 123 Main Street Branchburg NJ 08853
1 Branchburg Chipotle 123 Main Street Branchburg NJ 08853
2 Chipotle NJ 123 Main Street Branchburg NJ 08853
3 Chipotle 8853 123 Main Street Branchburg NJ 08853
4 The Home Depot 1220 N Wendover Rd Charlotte NC 28211
5 Home Depot 1220 N Wendover Rd Charlotte NC 28211
6 28211 Home Depot 1220 N Wendover Rd Charlotte NC 28211
7 Wendy's 2805 Whitson St Selma CA 93662
8 BJ's 2805 Whitson St Selma CA 93662
9 Buffalo wings 2805 Whitson St Selma CA 93662
I have to groupby Address and find the common word in the company column and write that to a new column 'count'. so for first address the common word is chipotle so count is 1 and for second address common word is home depot so count 2 and for third address no common words so count 0
Expected Ouput
company address count
0 Chipotle 123 Main Street Branchburg NJ 08853 1
1 The Home Depot 1220 N Wendover Rd Charlotte NC 28211 2
2 Wendy's 2805 Whitson St Selma CA 93662 0
I can think of iterating through the dataframe and using set intersection but that's too slow process. Is there any pandas way to achieve this?
Upvotes: 2
Views: 919
Reputation: 59274
from functools import reduce
import operator
def log(x):
inters = reduce(operator.and_, [set(r) for r in x.str.split()])
if inters: return (' '.join(inters), len(inters))
return (x.iloc[0], 0)
df.groupby('address').agg(log).company.apply(pd.Series).rename({0: 'company', 1: 'count'}, axis=1)
company count
address
1220 N Wendover Rd Charlotte NC 28211 Home Depot 2
123 Main Street Branchburg NJ 08853 Chipotle 1
2805 Whitson St Selma CA 93662 Wendys 0
If pandas 0.20
.rename(columns={0: 'company', 1: 'count'})
Upvotes: 4