Reputation: 8903
Given the following DataFrame:
>>> pd.DataFrame(data=[['a',1],['a',2],['b',3],['b',4],['c',5],['c',6],['d',7],['d',8],['d',9],['e',10]],columns=['key','value'])
key value
0 a 1
1 a 2
2 b 3
3 b 4
4 c 5
5 c 6
6 d 7
7 d 8
8 d 9
9 e 10
I'm looking for a method that will change the structure based on the key value, like so:
a b c d e
0 1 3 5 7 10
1 2 4 6 8 10 <- 10 is duplicated
2 2 4 6 9 10 <- 10 is duplicated
The result row number is as the longest group count (d in the above example) and the missing values are duplicates of the last available value.
Upvotes: 1
Views: 766
Reputation: 323226
Using pivot
, with groupby
+ cumcount
df.assign(key2=df.groupby('key').cumcount()).pivot('key2','key','value').ffill().astype(int)
Out[214]:
key a b c d e
key2
0 1 3 5 7 10
1 2 4 6 8 10
2 2 4 6 9 10
Upvotes: 2
Reputation: 862601
Create MultiIndex
by set_index
with counter column by cumcount
, reshape by unstack
, repalce missing values by last non missing ones with ffill
and last converting all data to integer
s if necessary:
df = df.set_index([df.groupby('key').cumcount(),'key'])['value'].unstack().ffill().astype(int)
Another solution with custom lambda function:
df = (df.groupby('key')['value']
.apply(lambda x: pd.Series(x.values))
.unstack(0)
.ffill()
.astype(int))
print (df)
key a b c d e
0 1 3 5 7 10
1 2 4 6 8 10
2 2 4 6 9 10
Upvotes: 5