aabujamra
aabujamra

Reputation: 4636

Concatenating/merging data frames and editing column names in Python/Pandas

I have built a DataFrame out of a Python dictionary, with the following command:

population=pd.DataFrame(population.items(),columns=['Date','population']).set_index('Date').sort_index(ascending=True)

That gave me the following frame population:

                 population
Date                      
2015-08                 69
2015-09                 65
2015-10                 65
2015-11                 66
2015-12                 71

Out of that DataFrame I created another one with its moving average, using the following command:

population_movav=pd.rolling_mean(population,10)

That gave me the following frame population_movav:

                 population
Date                      
2015-08               68.0
2015-09               69.9
2015-10               71.6
2015-11               71.1
2015-12               71.2

I want to combine them so they get like this:

                population   population_movav
Date                      
2015-08                 69               68.0  
2015-09                 65               69.9
2015-10                 65               71.6
2015-11                 66               71.1
2015-12                 71               71.2

Synthesizing, I need to concatenate them and change the column name of the variable population_movav. Tried the pd.concat but for some reason it is not working out right.

How can I achieve this?

Upvotes: 1

Views: 246

Answers (3)

pneumatics
pneumatics

Reputation: 2885

You can add a new column simply by referencing it by name in an assignment:

population['population_movav'] = pd.rolling_mean(population, 2, 1)

Gives you

         population  population_movav
Date
2015-08          69              69.0
2015-09          65              67.0
2015-10          65              65.0
2015-11          66              65.5
2015-12          71              68.5

Upvotes: 1

Anton Protopopov
Anton Protopopov

Reputation: 31682

You need to use pd.concat with axis=1 and then rename your last column to 'population_movav':

In [27]: df1
Out[27]: 
         population
Date               
2015-08          69
2015-09          65
2015-10          65
2015-11          66
2015-12          71

In [28]: df2
Out[28]: 
         population
Date               
2015-08        68.0
2015-09        69.9
2015-10        71.6
2015-11        71.1
2015-12        71.2

In [30]: df3 = pd.concat([df1, df2], axis=1)

In [31]: df3.columns = ['population', 'population_movav']
Out[31]: 
         population  population_movav
Date                           
2015-08          69        68.0
2015-09          65        69.9
2015-10          65        71.6
2015-11          66        71.1
2015-12          71        71.2

EDIT

If you need to change only the last column you could do following:

df3.columns =  df3.columns[:-1].tolist() + ['population_movav']

Upvotes: 2

jezrael
jezrael

Reputation: 863166

You can use join with rsuffix:

print population
            population
Date                  
2015-08-01          69
2015-09-01          65
2015-10-01          65
2015-11-01          66
2015-12-01          71

print population_movav
            population
Date                  
2015-08-01        68.0
2015-09-01        69.9
2015-10-01        71.6
2015-11-01        71.1
2015-12-01        71.2

p = population.join(population_movav, rsuffix="_movav")
print p
            population  population_movav
Date                                    
2015-08-01          69              68.0
2015-09-01          65              69.9
2015-10-01          65              71.6
2015-11-01          66              71.1
2015-12-01          71              71.2

Upvotes: 1

Related Questions