Richard
Richard

Reputation: 65560

Pandas: control new column names when merging two dataframes?

I would like to merge two Pandas dataframes together and control the names of the new column values.

I originally created the dataframes from CSV files. The original CSV files looked like this:

   # presents.csv
   org,name,items,spend...
   12A,Clerkenwell,151,435,...
   12B,Liverpool Street,37,212,...
   ...
   # trees.csv
   org,name,items,spend...
   12A,Clerkenwell,0,0,...
   12B,Liverpool Street,2,92,...
   ...

Now I have two data frames:

df_presents = pd.read_csv(StringIO(presents_txt))
df_trees = pd.read_csv(StringIO(trees_txt))

I want to merge them together to get a final data frame, joining on the org and name values, and then prefixing all other columns with an appropriate prefix.

org,name,presents_items,presents_spend,trees_items,trees_spend...
12A,Clerkenwell,151,435,0,0,...
12B,Liverpool Street,37,212,2,92,...

I've been reading the documentation on merging and joining. This seems to merge correctly and result in the right number of columns:

ad = pd.DataFrame.merge(df_presents, df_trees,
                        on=['practice', 'name'],
                        how='outer')

But then doing print list(aggregate_data.columns.values) shows me the following columns:

[org', u'name', u'spend_x', u'spend_y', u'items_x', u'items_y'...]

How can I rename spend_x to be presents_spend, etc?

Upvotes: 44

Views: 138163

Answers (3)

Amirkhm
Amirkhm

Reputation: 1096

Another way is adding suffix to the columns of your dataframe before merging:

ad.columns = 'ad_' + ad.columns.values

Upvotes: 4

Nguyen Ngoc Tuan
Nguyen Ngoc Tuan

Reputation: 351

You can rename all the columns of ad by setting its columns as follows.

ad.columns = ['org', 'name', 'presents_spend', 'trees_spend']

Upvotes: 9

itzy
itzy

Reputation: 11765

The suffixes option in the merge function does this. The defaults are suffixes=('_x', '_y').

In general, renaming columns can be done with the rename method.

Upvotes: 57

Related Questions