Reputation: 31
I got a Data Frame like this. In the values column there is a list of numbers per row. In the categories column there is a list of categories per row. Values are of Type int and Categories of Type string. Each value in the values column always fits the category value in the same position oft the list in the categories column. You can think of it as recipes. For example: For the recipe in the first row you need 2 of a, 4 of c, 3 of d and 5 of e.
| values | categories |
| ------ | ---------- |
| [2,4,3,5] | ['a','c','d','e'] |
| [1,6,7] | ['b','c','e'] |
| [3,5] | ['c','f'] |
I need to create a new Data Frame with pandas/ python so that it takes the distinct categories as columns and fills the rows with the corresponding values. So that it looks like this:
| a | b | c | d | e | f |
| - | - | - | - | - | - |
| 2 | 0 | 4 | 3 | 5 | 0 |
| 0 | 1 | 6 | 0 | 7 | 0 |
| 0 | 0 | 3 | 0 | 0 | 5 |
Thank you for your help.
Upvotes: 2
Views: 55
Reputation: 9941
Another option with explode
and pivot
:
df.apply(pd.Series.explode).pivot(columns='categories').fillna(0)
Output:
values
categories a b c d e f
0 2 0 4 3 5 0
1 0 1 6 0 7 0
2 0 0 3 0 0 5
Upvotes: 3
Reputation: 862581
Use list comprehension for list of dictionaries, pass to DataFrame
constructor and then replace missing values to 0
with sorting columns names:
L = [dict(zip(a, b)) for a, b in zip(df['categories'], df['values'])]
df = pd.DataFrame(L, index=df.index).fillna(0).astype(int).sort_index(axis=1)
print (df)
a b c d e f
0 2 0 4 3 5 0
1 0 1 6 0 7 0
2 0 0 3 0 0 5
Another idea is create dictionary by all uniwue sorted columns names and use {**dict1, **dict2}
merge trick:
d = dict.fromkeys(sorted(set([y for x in df['categories'] for y in x])), 0)
L = [{ **d, **dict(zip(a, b))} for a, b in zip(df['categories'], df['values'])]
df = pd.DataFrame(L, index=df.index)
print (df)
a b c d e f
0 2 0 4 3 5 0
1 0 1 6 0 7 0
2 0 0 3 0 0 5
Upvotes: 0