striatum
striatum

Reputation: 1598

How to crosstab a pandas dataframe when one variable (column) is a list of varying length

How can I generate a crossed table from the following dataframe:

import pandas as pd
dat = pd.read_csv('data.txt', sep=',')
dat.head(6)

  Factor1 Factor2
0       A       X
1       B       X
2       A     X|Y
3       B     X|Y
4       A   X|Y|Z
5       B   X|Y|Z

dat[['Factor2']] = dat[['Factor2']].applymap(lambda x : x.split('|'))
dat.head(6)

  Factor1    Factor2
0       A        [X]
1       B        [X]
2       A     [X, Y]
3       B     [X, Y]
4       A  [X, Y, Z]
5       B  [X, Y, Z]

The resulting pd.crosstab() should look like this:

  X Y Z
A 3 2 1
B 3 2 1

Upvotes: 3

Views: 482

Answers (2)

Ch3steR
Ch3steR

Reputation: 20669

You would have to first split on | using Series.str.split then explode using DataFrame.explode.

df['Factor2'] = df['Factor2'].str.split('|')
t = df.explode('Factor2')
pd.crosstab(t['Factor1'], t['Factor2'])

# Factor2  X  Y  Z
# Factor1         
# A        3  2  1
# B        3  2  1

# to remove the axis names.
# pd.crosstab(t['Factor1'], t['Factor2']).rename_axis(index=None, columns=None) 

Upvotes: 5

Shubham Sharma
Shubham Sharma

Reputation: 71689

We can use get_dummies to convert the Feature2 column to indicator variables, then group the indicator variables by Feature1 and aggregate with sum

df['Factor2'].str.get_dummies('|').groupby(df['Factor1']).sum()

         X  Y  Z
Factor1         
A        3  2  1
B        3  2  1

Upvotes: 5

Related Questions