Reputation: 3146
I have data like this in a csv file
Symbol Action Year
AAPL Buy 2001
AAPL Buy 2001
BAC Sell 2002
BAC Sell 2002
I am able to read it and groupby like this
df.groupby(['Symbol','Year']).count()
I get
Action
Symbol Year
AAPL 2001 2
BAC 2002 2
I desire this (order does not matter)
Action
Symbol Year
AAPL 2001 2
AAPL 2002 0
BAC 2001 0
BAC 2002 2
I want to know if its possible to count for zero occurances
Upvotes: 48
Views: 51570
Reputation: 2508
All the answers above are focusing on groupby or pivot table. However, as is well described in this article and in this question, this is a beautiful case for pandas' crosstab
function:
import pandas as pd
df = pd.DataFrame({
"Symbol": 2*['AAPL', 'BAC'],
"Action": 2*['Buy', 'Sell'],
"Year": 2*[2001,2002]
})
pd.crosstab(df["Symbol"], df["Year"]).stack()
yielding:
Symbol Year
AAPL 2001 2
2002 0
BAC 2001 0
2002 2
Upvotes: 0
Reputation: 12417
You can use this:
df = df.groupby(['Symbol','Year']).count().unstack(fill_value=0).stack()
print (df)
Output:
Action
Symbol Year
AAPL 2001 2
2002 0
BAC 2001 0
2002 2
Upvotes: 54
Reputation: 341
Datatype category
Maybe this feature didn't exist back when this thread was opened, however the datatype "category" can help here:
# create a dataframe with one combination of a,b missing
df = pd.DataFrame({"a":[0,1,1], "b": [0,1,0]})
df = df.astype({"a":"category", "b":"category"})
print(df)
Dataframe looks like this:
a b
0 0 0
1 1 1
2 1 0
And now, grouping by a and b
print(df.groupby(["a","b"]).size())
yields:
a b
0 0 1
1 0
1 0 1
1 1
Note the 0 in the rightmost column. This behavior is also documented in the pandas userguide (search on page for "groupby").
Upvotes: 6
Reputation: 478
Step 1: Create a dataframe that stores the count of each non-zero class in the column counts
count_df = df.groupby(['Symbol','Year']).size().reset_index(name='counts')
Step 2: Now use pivot_table to get the desired dataframe with counts for both existing and non-existing classes.
df_final = pd.pivot_table(count_df,
index=['Symbol','Year'],
values='counts',
fill_value = 0,
dropna=False,
aggfunc=np.sum)
Now the values of the counts can be extracted as a list with the command
list(df_final['counts'])
Upvotes: 0
Reputation: 3247
If you want to do this without using pivot_table, you can try the below approach:
midx = pd.MultiIndex.from_product([ df['Symbol'].unique(), df['Year'].unique()], names=['Symbol', 'Year'])
df_grouped_by = df_grouped_by.reindex(midx, fill_value=0)
What we are essentially doing above is creating a multi-index of all the possible values multiplying the two columns and then using that multi-index to fill zeroes into our group-by dataframe.
Upvotes: 0
Reputation: 862691
You can use pivot_table
with unstack
:
print df.pivot_table(index='Symbol',
columns='Year',
values='Action',
fill_value=0,
aggfunc='count').unstack()
Year Symbol
2001 AAPL 2
BAC 0
2002 AAPL 0
BAC 2
dtype: int64
If you need output as DataFrame
use to_frame
:
print df.pivot_table(index='Symbol',
columns='Year',
values='Action',
fill_value=0,
aggfunc='count').unstack()
.to_frame()
.rename(columns={0:'Action'})
Action
Year Symbol
2001 AAPL 2
BAC 0
2002 AAPL 0
BAC 2
Upvotes: 27