Reputation: 1598
How can I generate a crossed table from the following dataframe:
import pandas as pd
dat = pd.read_csv('data.txt', sep=',')
dat.head(6)
Factor1 Factor2
0 A X
1 B X
2 A X|Y
3 B X|Y
4 A X|Y|Z
5 B X|Y|Z
dat[['Factor2']] = dat[['Factor2']].applymap(lambda x : x.split('|'))
dat.head(6)
Factor1 Factor2
0 A [X]
1 B [X]
2 A [X, Y]
3 B [X, Y]
4 A [X, Y, Z]
5 B [X, Y, Z]
The resulting pd.crosstab()
should look like this:
X Y Z
A 3 2 1
B 3 2 1
Upvotes: 3
Views: 482
Reputation: 20669
You would have to first split on |
using Series.str.split
then explode using DataFrame.explode
.
df['Factor2'] = df['Factor2'].str.split('|')
t = df.explode('Factor2')
pd.crosstab(t['Factor1'], t['Factor2'])
# Factor2 X Y Z
# Factor1
# A 3 2 1
# B 3 2 1
# to remove the axis names.
# pd.crosstab(t['Factor1'], t['Factor2']).rename_axis(index=None, columns=None)
Upvotes: 5
Reputation: 71689
We can use get_dummies
to convert the Feature2
column to indicator variables, then group the indicator variables by Feature1
and aggregate with sum
df['Factor2'].str.get_dummies('|').groupby(df['Factor1']).sum()
X Y Z
Factor1
A 3 2 1
B 3 2 1
Upvotes: 5