user6111245
user6111245

Reputation:

Creating Dataframe from csv with pandas

I have a csv file with the following layout: year, race, sex, age and population. Each year has several different groups.

I created the following Dataframe from the CSV

CSV_df = pd.read_csv('Data/Demographics/Demo/akwbo19ages.csv') 

df = CSV_df[CSV_df["age"] >= 4].groupby(["year","race","sex","age"])['pop'].sum()

which results in

year  race  sex  age
1969  1     1    1      10574
                 2      20245
                 ...
                 n      11715
            2    1       8924
                 2       9919
                 ...
                 n       9960
                        ...  
2012  3     1    1       7861
                 2       8242
                 ...
                 n       7268
            2    1       7245
                 2       7821
                 ...
                 n       6912

However, what I would like to have is for each row to represent a single year and have several columns representing each group (i.e. columns with population figures for each possible combination of race, sex and age group)

year  group1  group2 ... groupN
1969  10574   20245      9960
...
2012  7861    8242       6912

Upvotes: 1

Views: 1473

Answers (1)

jezrael
jezrael

Reputation: 863731

IIUC you need unstack with reset_index, then by list comprehension rename columns names:

print s
year  race  sex  age
1969  1     1    1      10574
                 2      20245
            2    1       8924
                 2       9919
2012  3     1    1       7861
                 2       8242
            2    1       7245
                 2       7821
Name: a, dtype: int64


df = s.unstack().reset_index(drop=True, level=[1,2]).rename_axis(None)
df.columns = ['group' + str(col) for col in df.columns]
print df
      group1  group2
1969   10574   20245
1969    8924    9919
2012    7861    8242
2012    7245    7821

Or if you need index name remove rename_axis:

df = s.unstack().reset_index(drop=True, level=[1,2])
df.columns = ['group' + str(col) for col in df.columns]
print df
      group1  group2
year                
1969   10574   20245
1969    8924    9919
2012    7861    8242
2012    7245    7821

Upvotes: 2

Related Questions