Reputation: 1098
I want to reshape a table in pandas . I have a table of the form:
date | country |state | population | num_cars
1 | c1 | s1 | 1 | 1
2 | c1 | s1 | 1 | 1
1 | c1 | s2 | 1 | 1
.
2 | c2 | s2 | 1 | 2
2 | c2 | s2 | 1 | 2
I want to turn it to this shape:
date |1_population | c1_s1_population | c1_s2_population...| c2_s1_populationc1_num_cars |c2_11_num_cars...
To explain, the initial data has pop and numbers by country, state for a date range. Now I want to convert into a number of columns of time series for each level (country, country-state)
How do I do this?
Upvotes: 0
Views: 131
Reputation: 30971
As the source data sample, I used a DataFrame with 2 hypothetical countries, 3 states each:
date country state population num_cars
0 1990 Xxx Aaa 100 15
1 2010 Xxx Aaa 120 18
2 1990 Xxx Bbb 80 9
3 2010 Xxx Bbb 88 11
4 1990 Xxx Ccc 75 6
5 2010 Xxx Ccc 82 8
6 1990 Yyy Ggg 40 5
7 2010 Yyy Ggg 50 6
8 1990 Yyy Hhh 30 3
9 2010 Yyy Hhh 38 4
10 1990 Yyy Jjj 29 3
11 2010 Yyy Jjj 35 4
To solve your problem, start with defining a reformatting function:
def reformat(grp, col):
pop = grp[col]
pop.name = grp.date.iloc[0]
return pop
From a group of rows (grp) it takes a column of particular name (col), sets the name as date from the first row (the grouping key) and returns it.
As the initial step, group df by country and state:
gr = df.set_index(['country', 'state']).groupby('date')
Then compute 2 DataFrames, as the result of reformatting (applying the above function to each group, for both columns of interest:
df1 = gr.apply(reformat, col='population')
df2 = gr.apply(reformat, col='num_cars')
And having two partial results, merge them on indices:
pd.merge(df1, df2, left_index=True, right_index=True,
suffixes=('_pop', '_cars'))
The result is:
country Xxx_pop Yyy_pop Xxx_cars Yyy_cars
state Aaa Bbb Ccc Ggg Hhh Jjj Aaa Bbb Ccc Ggg Hhh Jjj
date
1990 100 80 75 40 30 29 15 9 6 5 3 3
2010 120 88 82 50 38 35 18 11 8 6 4 4
As you can see, the top level of MultiIndex on columns is "Country / population" and "Country / car No". The other level contains state names.
To trace how tis solution works, execute each step separately and inspect its result.
Upvotes: 1