Reputation: 59
I have a panda's related question. My dataframe looks something like this:
id val1 val2
0 1 0 1
1 1 1 0
2 1 0 0
3 2 1 1
4 2 1 1
5 2 1 0
6 3 0 0
7 3 0 1
8 3 1 1
9 4 1 0
10 4 0 1
11 4 0 0
I want to transform it into something like:
a b c
id a0 a1 b0 b1 c0 c1
1 0 1 1 0 0 0
2 1 1 1 1 1 0
3 0 0 1 1 1 1
4 1 0 0 1 0 0
I thought of something like adding a sub_id column that is enumerated cyclically by a, b and c and then do an unstack of the frame. Is there an easier/smarter solution?
Thanks a lot!
Tim
Upvotes: 2
Views: 64
Reputation: 30971
One of possible solutions:
Start from reformatting values for each id into a single row:
res = df.set_index('id').groupby('id').apply(
lambda grp: pd.Series(grp.values.flatten()))
For now the result is:
0 1 2 3 4 5
id
1 0 1 1 0 0 0
2 1 1 1 1 1 0
3 0 0 0 1 1 1
4 1 0 0 1 0 0
Then set proper column names:
res.columns = pd.MultiIndex.from_tuples(
[(x, x + y) for x in list('abc') for y in list('01')])
The finale result is:
a b c
a0 a1 b0 b1 c0 c1
id
1 0 1 1 0 0 0
2 1 1 1 1 1 0
3 0 0 0 1 1 1
4 1 0 0 1 0 0
Upvotes: 0
Reputation: 862511
If possible numbers instead abc
is use GroupBy.cumcount
for counter, create MultiIndex
by DataFrame.set_index
and reshape by DataFrame.unstack
and last sorting second level with DataFrame.swaplevel
:
g = df.groupby('id').cumcount()
df = df.set_index(['id', g]).unstack().sort_index(axis=1, level=1).swaplevel(0,1,axis=1)
print (df)
0 1 2
val1 val2 val1 val2 val1 val2
id
1 0 1 1 0 0 0
2 1 1 1 1 1 0
3 0 0 0 1 1 1
4 1 0 0 1 0 0
If want a,b,c
values is possible generate dictionary from string.ascii_lowercase
and rename
columns:
import string
d = dict(enumerate(string.ascii_lowercase))
df = df.rename(columns=d)
print (df)
a b c
val1 val2 val1 val2 val1 val2
id
1 0 1 1 0 0 0
2 1 1 1 1 1 0
3 0 0 0 1 1 1
4 1 0 0 1 0 0
Solution for rename both levels is first create default columns names by range after set_index
:
g = df.groupby('id').cumcount()
df = df.set_index(['id', g])
df.columns = range(len(df.columns))
df = df.unstack().sort_index(axis=1, level=1).swaplevel(0,1,axis=1)
print (df)
0 1 2
0 1 0 1 0 1
id
1 0 1 1 0 0 0
2 1 1 1 1 1 0
3 0 0 0 1 1 1
4 1 0 0 1 0 0
And last in list comprehension set new values:
import string
d = dict(enumerate(string.ascii_lowercase))
df.columns = pd.MultiIndex.from_tuples([(d[a], f'{d[a]}{b}') for a, b in df.columns])
print (df)
a b c
a0 a1 b0 b1 c0 c1
id
1 0 1 1 0 0 0
2 1 1 1 1 1 0
3 0 0 0 1 1 1
4 1 0 0 1 0 0
Upvotes: 2