Reputation: 450
I have a large dataset which has two columns Name, Value
and it looks like this:
import pandas as pd
data = [['code',10],['classe',12],['series','B'], ['code',12],['classe',1],
['series','C'],['code',16],['classe',18],['series','A']]
df1 = pd.DataFrame(data,columns=['Name','Value'])
df1
Output
Name Value
0 code 10
1 classe 12
2 series B
3 code 12
4 classe 1
5 series C
6 code 16
7 classe 18
8 series A
And I want some thing like that:
code classe series
0 10 10 B
1 12 1 C
2 16 18 A
In my dataset it reapts N time and i want to transform it to three columns code, classe, series
.
Thanks for your help in advance!
Upvotes: 5
Views: 402
Reputation: 3103
You can accomplish this using .pivot
df2 = df1.pivot(columns='Name', values='Value')
pd.concat([df2[series].dropna().reset_index(drop=True) for series in df2], axis=1)
Output
classe code series
0 12 10 B
1 1 12 C
2 18 16 A
More so, if you changed the ordered data, you still get the desired output:
import pandas as pd
data = [['code',10],['classe',12],['classe', 14], ['series','B'], ['series', 'C'], ['code',12],['classe',1],
['series','C'],['code',16],['classe',18],['series','A']]
df1 = pd.DataFrame(data,columns=['Name','Value'])
df1
Name Value
0 code 10
1 classe 12
2 classe 14 #Added classe
3 series B
4 series C #Added Series
5 code 12
6 classe 1
7 series C
8 code 16
9 classe 18
10 series A
The output will be:
classe code series
0 12 10 B
1 14 12 C
2 1 16 C
3 18 NaN A
Upvotes: 4
Reputation: 402513
Option 1
pd.concat
with a groupby
should do it.
pd.concat([
pd.Series(v.values, name=k) for k, v in df1.groupby('Name')['Value']
],
axis=1
)
classe code series
0 12 10 B
1 1 12 C
2 18 16 A
Option 2
pivot
Flaky pivot hack, don't count on it! This solution assumes values inside Name
alternate regularly - code, classe, series, code, classe, series, ... and so on. Won't work otherwise.
df1.assign(Index=df1.index // 3).pivot('Index', 'Name', 'Value')
Name classe code series
Index
0 12 10 B
1 1 12 C
2 18 16 A
Upvotes: 3
Reputation: 323236
create a new key by using cumsum
, then unstack
df1['new']=(df1.Name=='code').cumsum()
df1.set_index(['new','Name']).Value.unstack()
Out[80]:
Name classe code series
new
1 12 10 B
2 1 12 C
3 18 16 A
Upvotes: 2