Reputation: 306
I have a dataframe (df) of messages that appears similar the following:
From To
[email protected] [email protected]
[email protected] [email protected], [email protected]
[email protected] [email protected], [email protected]
I want to count the amount of times each email appears from a specific list. My list being:
lst = ['[email protected]', '[email protected]', '[email protected]']
I'm hoping to receive a dataframe/series/dictionary with a result like this:
list_item Total_Count
[email protected] 2
[email protected] 2
[email protected] 1
I'm tried several different things, but haven't succeeded. I thought I could try something like the for loop below (it returns a Syntax Error), but I cannot figure out the right way to write it.
for To,From in zip(df.To, df.From):
for item in lst:
if To,From contains item in emails:
Count(item)
Should this type of task be accomplished with a for loop or are there out of the box pandas methods that could solve this easier?
Upvotes: 1
Views: 436
Reputation: 403130
stack
-basedSplit your To
column, stack
everything and then do a value_counts
:
v = pd.concat([df.From, df.To.str.split(', ', expand=True)], axis=1).stack()
v[v.isin(lst)].value_counts()
[email protected] 2
[email protected] 2
[email protected] 1
dtype: int64
melt
Another option is to use melt
:
v = (df.set_index('From')
.To.str.split(', ', expand=True)
.reset_index()
.melt()['value']
)
v[v.isin(lst)].value_counts()
[email protected] 2
[email protected] 2
[email protected] 1
Name: value, dtype: int64
Note that set_index
+ str.split
+ reset_index
is synonymous to pd.concat([...])
...
Upvotes: 2