Reputation: 615
I want to reshape a pandas dataframe based on the values in a specific column such that i get a new column for each of the value-column pairs in the starting dataframe. I want to get from this:
import pandas as pd
d = {'city': ['Berlin', 'Berlin', 'Berlin', 'London', 'London', 'London'],
'weather': ['sunny', 'sunny', 'cloudy','sunny', 'cloudy', 'cloudy'], 'temp': [20,22,19, 21, 18, 17]}
df = pd.DataFrame(data=d)
df
city weather temp
0 Berlin sunny 20
1 Berlin sunny 22
2 Berlin cloudy 19
3 London sunny 21
4 London cloudy 18
5 London cloudy 17
to this:
d_2 = {'Berlin_weather': ['sunny', 'sunny', 'cloudy'], 'Berlin_temp': [20,22,19],
'London_weather': ['sunny', 'cloudy', 'cloudy'], 'London_temp': [21, 18, 17]}
df_2 = pd.DataFrame(data=d_2)
df_2
Berlin_weather Berlin_temp London_weather London_temp
0 sunny 20 sunny 21
1 sunny 22 cloudy 18
2 cloudy 19 cloudy 17
I have tried using .unstack() but I cannot get it to work properly. A loop is obvious, but the size of my actual dataset makes that a bit unfeasible.
Upvotes: 5
Views: 1576
Reputation: 23099
Let's create a new index then use unstack
:
df1 = df.set_index([df['city'],df.groupby('city').cumcount()]).drop('city',1).unstack(0)
Then flatten the multi index columns:
df1.columns = [f'{y}_{x}' for x,y in df1.columns]
print(df1)
Berlin_weather London_weather Berlin_temp London_temp
0 sunny sunny 20 21
1 sunny cloudy 22 18
2 cloudy cloudy 19 17
If order is of importance we can use pd.CategoricalIndex
before flattening the columns:
cati = pd.CategoricalIndex(df1.columns.get_level_values(0).unique(),
['weather','temp'],
ordered=True)
df1.columns = df1.columns.set_levels(cati, level=0)
df1 = df1.sort_index(1,1) # level = 1 and axis = 1 -- columns.
df1.columns = [f'{y}_{x}' for x,y in df1.columns]
Berlin_weather Berlin_temp London_weather London_temp
0 sunny 20 sunny 21
1 sunny 22 cloudy 18
2 cloudy 19 cloudy 17
Upvotes: 7