Konstantin
Konstantin

Reputation: 103

Pandas, Pivot table from 2 columns with values being a count of one of those columns

I have a pandas dataframe:

+---------------+-------------+
| Test_Category | Test_Result |
+---------------+-------------+
| Cat_1         | Pass        |
| Cat_1         | N/A         |
| Cat_2         | Fail        |
| Cat_2         | Fail        |
| Cat_3         | Pass        |
| Cat_3         | Pass        |
| Cat_3         | Fail        |
| Cat_3         | N/A         |
+---------------+-------------+

I need a table like this:

+------+------+------+-----+
|      | Pass | Fail | N/A |
+------+------+------+-----+
| Cat1 |    1 |      |   1 |
| Cat2 |      |    2 |     |
| Cat3 |    2 |    1 |   1 |
+------+------+------+-----+

I tried using a Pivot, but can't figure out how to make it count occurrences from Test_Result column and put them as values into pivot result.

Thank you!

Upvotes: 3

Views: 11231

Answers (2)

jezrael
jezrael

Reputation: 862671

Here is problem NaN values are exluded, so necessary use fillna with crosstab:

df1 = pd.crosstab(df['Test_Category'], df['Test_Result'].fillna('n/a'))
print (df1)
Test_Result    Fail  Pass  n/a
Test_Category                 
Cat_1             0     1    1
Cat_2             2     0    0
Cat_3             1     2    1

Or use GroupBy.size with unstack for reshape:

df['Test_Result'] = df['Test_Result'].fillna('n/a')

df1 = df.groupby(['Test_Category','Test_Result']).size().unstack()
print (df1)
Test_Result    Fail  Pass  n/a
Test_Category                 
Cat_1           NaN   1.0  1.0
Cat_2           2.0   NaN  NaN
Cat_3           1.0   2.0  1.0

df1 = df.groupby(['Test_Category','Test_Result']).size().unstack(fill_value=0)
print (df1)
Test_Result    Fail  Pass  n/a
Test_Category                 
Cat_1             0     1    1
Cat_2             2     0    0
Cat_3             1     2    1

Another solution with pivot_table:

df = df.pivot_table(index='Test_Category',columns='Test_Result', aggfunc='size')

Upvotes: 7

Michele Tonutti
Michele Tonutti

Reputation: 4348

You could construct a new dataframe using unique values in the two columns as indices and columns, and use pandas' iterrows()

df_out = pd.DataFrame(index=df['Test_Category'].unique().tolist(), columns=df['Test_Result'].unique().tolist())

for index, row in df_out.iterrows():
    for col in df_out.columns:
        df_out.loc[index, col] = len(df[(df['Test_Category'] == index) & (df['Test_Result'] == col)])

Output:

       Pass  nan  Fail
Cat1     1    1     0
Cat2     0    0     2
Cat3     2    1     1

Although using groupby() should definitely be faster.

Upvotes: 1

Related Questions