journo
journo

Reputation: 129

Transpose columns to rows, show value_counts of former columns as column value in Pandas

Let's suppose I have this dataset. Each variable can assume values between 1 and 5. The variables represent similar objects, so I would like to compare them across all data sets.

import pandas as pd
df = pd.DataFrame({'dataset': ["ds1", "ds2", "ds3", "ds4", "ds5"], 
                   'var1': [1, 4, 5, 3, 1], 
                   'var2': [2, 1, 1, 2, 5],
                   'var3': [2, 1, 1, 2, 5]})
df

I want to transpose the df using .T and sum up the data as follows:

rows: each representing one variable (var1 to var3) columns 1 - 5 : representing values of these variables, each cell showing the total number of appearances of this value across all datasets (var1.value_counts)

Example: Row1 represents var1. Column1 shows "1" appears two times across all datasets, column2 shows 2 appears 0 times

Expected outcome

         1 2 3 4 5
var1     2 0 1 1 1
var2     2 2 0 0 1
var3     2 2 0 0 1 

Maybe I need to use crosstab or pivot_table? Thanks!

Upvotes: 2

Views: 992

Answers (2)

sammywemmy
sammywemmy

Reputation: 28689

Yes, crosstab works fine (or pivot table in @jezrael's solution), after melt :

tab = df.melt("dataset")

pd.crosstab(tab.variable, tab.value).rename_axis(index=None, columns=None)


        1   2   3   4   5
var1    2   0   1   1   1
var2    2   2   0   0   1
var3    2   2   0   0   1

Yes, you can use the normalize argument, and choose if you want it via rows(index) or columns :

pd.crosstab(tab.variable, tab.value, normalize='all').rename_axis(index=None, columns=None)

Upvotes: 2

jezrael
jezrael

Reputation: 862761

Use DataFrame.melt with DataFrame.pivot_table and DataFrame.rename_axis:

df1 = (df.melt('dataset')
         .pivot_table(index='variable',
                      columns='value',
                      aggfunc='size',
                      fill_value=0)
         .rename_axis(index=None, columns=None))
print (df1)
      1  2  3  4  5
var1  2  0  1  1  1
var2  2  2  0  0  1
var3  2  2  0  0  1

Or for all columns without first value_counts with transpose, replaced missing values to 0 and last converting all values to integers:

df = df.set_index('dataset').apply(pd.value_counts).T.fillna(0).astype(int)
print (df)
      1  2  3  4  5
var1  2  0  1  1  1
var2  2  2  0  0  1
var3  2  2  0  0  1
    

Upvotes: 2

Related Questions