Transpose columns to rows, show value_counts of former columns as column value in Pandas

Question

Let's suppose I have this dataset. Each variable can assume values between 1 and 5. The variables represent similar objects, so I would like to compare them across all data sets.

import pandas as pd
df = pd.DataFrame({'dataset': ["ds1", "ds2", "ds3", "ds4", "ds5"], 
                   'var1': [1, 4, 5, 3, 1], 
                   'var2': [2, 1, 1, 2, 5],
                   'var3': [2, 1, 1, 2, 5]})
df

I want to transpose the df using .T and sum up the data as follows:

rows: each representing one variable (var1 to var3) columns 1 - 5 : representing values of these variables, each cell showing the total number of appearances of this value across all datasets (var1.value_counts)

Example: Row1 represents var1. Column1 shows "1" appears two times across all datasets, column2 shows 2 appears 0 times

Expected outcome

         1 2 3 4 5
var1     2 0 1 1 1
var2     2 2 0 0 1
var3     2 2 0 0 1

Maybe I need to use crosstab or pivot_table? Thanks!

jezrael · Accepted Answer

Use DataFrame.melt with DataFrame.pivot_table and DataFrame.rename_axis:

df1 = (df.melt('dataset')
         .pivot_table(index='variable',
                      columns='value',
                      aggfunc='size',
                      fill_value=0)
         .rename_axis(index=None, columns=None))
print (df1)
      1  2  3  4  5
var1  2  0  1  1  1
var2  2  2  0  0  1
var3  2  2  0  0  1

Or for all columns without first value_counts with transpose, replaced missing values to 0 and last converting all values to integers:

df = df.set_index('dataset').apply(pd.value_counts).T.fillna(0).astype(int)
print (df)
      1  2  3  4  5
var1  2  0  1  1  1
var2  2  2  0  0  1
var3  2  2  0  0  1

Transpose columns to rows, show value_counts of former columns as column value in Pandas

Answers (2)

Related Questions