JodeCharger100
JodeCharger100

Reputation: 1059

Rename a column based on values in another column

I have a set of four dataframes: df_all = [df1, df2, df3, df4]

As a sample, they look like this:

df1:

Name     Dates        a

Apple   5-5-15        NaN
Apple   6-5-15        42
Apple   6-5-16        36
Apple   6-5-17        36

 df2: 

Name     Dates          a

Banana   5-5-15        85
Banana   6-5-15        NaN
Banana   6-6-15        100
Banana   6-5-16        18

I want to merge on "Dates", which I achieve in the following manner:

for cols in df_all:
    cols = cols.drop(['Name'], axis=1, inplace=True)
a = df1.merge(df2, how='left', on = 'Date').merge(df3, how='left',  on = 'Date').merge(df4, how='left',  on = 'Date')

This gives me exactly what I want. However, the columns are renamed as a_x, a_y, a_x, a_y This sample below shows what happens when I merge only df1 and df2.

Dates         a_x        a_y

5-5-15        NaN         85
6-5-15        42         NaN
6-6-15        NaN        100
6-5-16        36          18
6-5-17        36         NaN

Before the merge, I want to rename column a based on the value in Name (apple, or banana), and I want to automate it as much as possible to rename all dataframe column 'a' to the value in their column 'Name'

Upvotes: 1

Views: 1455

Answers (4)

BENY
BENY

Reputation: 323226

Try do with concat and modify your dataframe

df=pd.concat([x.set_index(['Name','Dates']).a.unstack(level=0) for x in listdf])

Or combine then then pivot_table

df=pd.concat([df1,df2]).pivot_table(index='Dates',columns='Name',values='a',aggfunc='first')
Name    Apple  Banana
Dates                
5-5-15    NaN    85.0
6-5-15   42.0     NaN
6-5-16   36.0    18.0
6-5-17   36.0     NaN
6-6-15    NaN   100.0

Upvotes: 2

Shubham Sharma
Shubham Sharma

Reputation: 71689

You can automate the process of merging the dataframe by renaming the columns before merging and then using functools.reduce to merge all dataframes in df_all:

from functools import reduce 

# rename column a
df_all = [df.rename(columns={'a': df.pop('Name').iloc[0]}) for df in df_all]

# merge all dataframes
merged = reduce(lambda d1, d2: pd.merge(d1, d2, on=['Dates'], how='left') , df_all)

# print(merged)
# sample result after merging df1 & df2 

    Dates  Apple  Banana
0  5-5-15    NaN    85.0
1  6-5-15   42.0     NaN
2  6-5-16   36.0    18.0
3  6-5-17   36.0     NaN

Upvotes: 1

Strange
Strange

Reputation: 1540

Try to change the column name in your first for loop before you drop that column.

for cols in df_all:
    name = cols['Name'][0]
    cols.drop(['Name'],axis=1,inplace=True)
    cols.rename(columns={'a':name},inplace=True)
a = df1.merge(df2, how='left', on = 'Date').merge(df3, how='left',  on = 'Date').merge(df4, how='left',  on = 'Date')

Upvotes: 2

Leonardo Dagnino
Leonardo Dagnino

Reputation: 3215

You can do a rename before you drop the name column. Since all names are the same in a dataframe, you can get it from the first line:

for cols in df_all:
    cols.rename(columns={'a': cols.at[0, 'Name']}, inplace=True)
    cols = cols.drop(['Name'], axis=1, inplace=True)

Upvotes: 1

Related Questions