Reputation: 87
I am a pandas newbie, and I am trying to solve the following problem.
I have a large DataFrame
(10000 x 28) as follows.
Col1 Col2 Col3 Col4 Col5
A B C D E
How can I reshape it to look like the following?
Col1 Col2 NewColName
A B C
A B D
A B E
The first two columns remain the same, and the last three columns get "distributed" over them.
My goal is to take a large DataFrame
and turn it into "tidy data"
I have tried methods like concat, stack etc. but I feel I'm missing some elegant way to solve this.
Upvotes: 2
Views: 924
Reputation: 42905
Set the ['Col1, 'Col2]
as index
and use .stack()
.
df.set_index(['Col1', 'Col2']).stack()
Col1 Col2
A B 0 C
0 D
0 E
Then do .reset_index()
to format as in your example (you can also add name='Col'
for the same result as suggested by @jezrael:
df.reset_index(-1, drop=True).reset_index(name='Col')
Col1 Col2 0
0 A B C
1 A B D
2 A B E
Upvotes: 0
Reputation: 863166
print pd.melt(df, id_vars=['Col1','Col2'],value_name='Col').drop('variable', axis=1)
Col1 Col2 Col
0 A B C
1 A B D
2 A B E
Timings:
df = pd.concat([df]*1000).reset_index(drop=True)
In [58]: %timeit pd.melt(df, id_vars=['Col1','Col2'],value_name='Col').drop('variable', axis=1)
100 loops, best of 3: 2.48 ms per loop
In [59]: %timeit df.set_index(['Col1', 'Col2']).stack().reset_index(-1, drop=True).reset_index(name='Col')
100 loops, best of 3: 3.83 ms per loop
Upvotes: 0