Reputation: 3577
I have a DataFrame similar to this:
Key Departure Species1 Species2 Status
1 R Carlan Carlan D
1 R Scival Carex C
2 R Carlan Scival D
2 R Scival Bougra C
3 D Carlan Carlan D
3 D Scival Scival C
I want to count the occurrences of each unique Species1
for a given Departure
and Status
of D
of C
My desired output is:
Species1 RD RC DD DC
Carlan 2 NaN 1 NaN
Scival NaN 2 NaN 1
Upvotes: 1
Views: 555
Reputation: 2559
Use the pandas.crosstab() method. A single line of code:
pd.crosstab(df.Species1, [df.Departure, df.Status])
The resulting table:
If you combine with @dermen's 'comb' column,
df['comb'] = df.Departure + df.Status
pd.crosstab(df.Species1, df.comb)
you'll get:
If you really want those 'NaN', just tack on a .replace('0', np.nan)
, like so (assuming an import numpy as np
has already been done):
pd.crosstab(df.Species1, df.comb).replace('0', np.nan)
Upvotes: 2
Reputation: 134
you can use the a groupby query on multiple colums and and use the.agg function to count the number of occurences:
df.groupby(['Species1', 'Departure', 'Status']).agg(['count'])
Upvotes: 0
Reputation: 5362
Make a new column that is the combination of Departure and Status
df['comb'] = df.Departure + df.Status
df
# Key Departure Species1 Species2 Status comb
#0 1 R Carlan Carlan D RD
#1 1 R Scival Carex C RC
#2 2 R Carlan Scival D RD
#3 2 R Scival Bougra C RC
#4 3 D Carlan Carlan D DD
#5 3 D Scival Scival C DC
Then you can groupby:
gb = df.groupby(['Species1', 'comb'])
gb.groups
#{('Carlan', 'DD'): [4],
#('Carlan', 'RD'): [0, 2],
#('Scival', 'DC'): [5],
#('Scival', 'RC'): [1, 3]}
Now organize the results into a list, where each element is a tuple (column, Series(data, index))
representing a single data point in a new dataframe
items = [ (key[1], pandas.Series( [len(val)], index=[key[0]] ) )for key,val in gb.groups.items() ]
And make a new dataframe from the items:
result = pandas.from_items( items)
result
# RC DC DD RD
#Carlan NaN NaN 1 2
#Scival 2 1 NaN NaN
See this link for ideas on crating new dataframes from various objects. When you want to create a dataframe from individual data points (e.g. (Species1,comb) ), then from_items
is your best option.
Upvotes: 3