Reputation: 1635
My csv file has weather information by city. One row has so many columns (over 1200) by time. For example it looks like,
id city_name dt_0 temp_0 hum_0 dt_1 temp_1 hum_1 dt_2 temp_2 hum_2
1 Boston 2017110306 23.5 54.0 2017110310 21.4 40.0 2017110314 22.2 52.1
2 Seattle 2017110306 20.4 60.0 2017110310 18.4 42.0 2017110314 18.3 50.5
The schema is not useful for me. So I want to convert it by python Pandas DataFrame. What I want is for it to look like,
id city_name dt temp hum
1 Boston 2017110306 23.5 54.0
1 Boston 2017110310 21.4 40.0
1 Boston 2017110314 22.2 52.1
2 Seattle 2017110306 20.4 60.0
2 Seattle 2017110310 18.4 42.0
2 Seattle 2017110314 18.3 50.5
How to do it ?
Upvotes: 0
Views: 66
Reputation: 862481
First set_index
, then create MultiIndex
with split
and last reshape by stack
:
df = df.set_index(['id','city_name'])
df.columns = df.columns.str.split('_', expand=True)
df = df.stack().reset_index(level=2, drop=True).reset_index()
print (df)
id city_name dt hum temp
0 1 Boston 2017110306 54.0 23.5
1 1 Boston 2017110310 40.0 21.4
2 1 Boston 2017110314 52.1 22.2
3 2 Seattle 2017110306 60.0 20.4
4 2 Seattle 2017110310 42.0 18.4
5 2 Seattle 2017110314 50.5 18.3
Upvotes: 3