Reputation:
I have a csv file with the following layout: year, race, sex, age and population. Each year has several different groups.
I created the following Dataframe from the CSV
CSV_df = pd.read_csv('Data/Demographics/Demo/akwbo19ages.csv')
df = CSV_df[CSV_df["age"] >= 4].groupby(["year","race","sex","age"])['pop'].sum()
which results in
year race sex age
1969 1 1 1 10574
2 20245
...
n 11715
2 1 8924
2 9919
...
n 9960
...
2012 3 1 1 7861
2 8242
...
n 7268
2 1 7245
2 7821
...
n 6912
However, what I would like to have is for each row to represent a single year and have several columns representing each group (i.e. columns with population figures for each possible combination of race, sex and age group)
year group1 group2 ... groupN
1969 10574 20245 9960
...
2012 7861 8242 6912
Upvotes: 1
Views: 1473
Reputation: 863731
IIUC you need unstack
with reset_index
, then by list
comprehension rename columns names
:
print s
year race sex age
1969 1 1 1 10574
2 20245
2 1 8924
2 9919
2012 3 1 1 7861
2 8242
2 1 7245
2 7821
Name: a, dtype: int64
df = s.unstack().reset_index(drop=True, level=[1,2]).rename_axis(None)
df.columns = ['group' + str(col) for col in df.columns]
print df
group1 group2
1969 10574 20245
1969 8924 9919
2012 7861 8242
2012 7245 7821
Or if you need index name
remove rename_axis:
df = s.unstack().reset_index(drop=True, level=[1,2])
df.columns = ['group' + str(col) for col in df.columns]
print df
group1 group2
year
1969 10574 20245
1969 8924 9919
2012 7861 8242
2012 7245 7821
Upvotes: 2