OfSorts
OfSorts

Reputation: 306

Value counts for specific items in a DataFrame

I have a dataframe (df) of messages that appears similar the following:

From                To
[email protected]   [email protected]
[email protected]   [email protected], [email protected]
[email protected]   [email protected], [email protected]

I want to count the amount of times each email appears from a specific list. My list being:

lst = ['[email protected]', '[email protected]', '[email protected]']

I'm hoping to receive a dataframe/series/dictionary with a result like this:

list_item              Total_Count
[email protected]      2
[email protected]    2
[email protected]      1

I'm tried several different things, but haven't succeeded. I thought I could try something like the for loop below (it returns a Syntax Error), but I cannot figure out the right way to write it.

for To,From in zip(df.To, df.From): 
    for item in lst:
        if To,From contains item in emails:
            Count(item)

Should this type of task be accomplished with a for loop or are there out of the box pandas methods that could solve this easier?

Upvotes: 1

Views: 436

Answers (1)

cs95
cs95

Reputation: 403130

stack-based

Split your To column, stack everything and then do a value_counts:

v = pd.concat([df.From, df.To.str.split(', ', expand=True)], axis=1).stack()
v[v.isin(lst)].value_counts()

[email protected]    2
[email protected]      2
[email protected]      1
dtype: int64

melt

Another option is to use melt:

v = (df.set_index('From')
      .To.str.split(', ', expand=True)
      .reset_index()
      .melt()['value']
)
v[v.isin(lst)].value_counts()

[email protected]    2
[email protected]      2
[email protected]      1
Name: value, dtype: int64

Note that set_index + str.split + reset_index is synonymous to pd.concat([...])...

Upvotes: 2

Related Questions