Reputation: 13
I have a pandas data frame as follows
A | B | C | D | ... | Z |
---|
and another data frame in which every column has zero or more letters as follows:
Letters |
---|
A,C,D |
A,B,F |
A,H,G |
A |
B,F |
None |
I want to match the two dataframes to have something like this
A | B | C | D | ... | Z |
---|---|---|---|---|---|
1 | 0 | 1 | 1 | 0 | 0 |
Upvotes: 0
Views: 33
Reputation: 13212
make example and desired output for answer
Example:
data = ['A,C,D', 'A,B,F', 'A,E,G', None]
df = pd.DataFrame(data, columns=['letter'])
df :
letter
0 A,C,D
1 A,B,F
2 A,E,G
3 None
get_dummies
and groupby
pd.get_dummies(df['letter'].str.split(',').explode()).groupby(level=0).sum()
output:
A B C D E F G
0 1 0 1 1 0 0 0
1 1 1 0 0 0 1 0
2 1 0 0 0 1 0 1
3 0 0 0 0 0 0 0
Upvotes: 1