Reputation: 1059
I have a set of four dataframes: df_all = [df1, df2, df3, df4]
As a sample, they look like this:
df1:
Name Dates a
Apple 5-5-15 NaN
Apple 6-5-15 42
Apple 6-5-16 36
Apple 6-5-17 36
df2:
Name Dates a
Banana 5-5-15 85
Banana 6-5-15 NaN
Banana 6-6-15 100
Banana 6-5-16 18
I want to merge on "Dates", which I achieve in the following manner:
for cols in df_all:
cols = cols.drop(['Name'], axis=1, inplace=True)
a = df1.merge(df2, how='left', on = 'Date').merge(df3, how='left', on = 'Date').merge(df4, how='left', on = 'Date')
This gives me exactly what I want. However, the columns are renamed as a_x, a_y, a_x, a_y
This sample below shows what happens when I merge only df1 and df2.
Dates a_x a_y
5-5-15 NaN 85
6-5-15 42 NaN
6-6-15 NaN 100
6-5-16 36 18
6-5-17 36 NaN
Before the merge, I want to rename column a
based on the value in Name (apple, or banana), and I want to automate it as much as possible to rename all dataframe column 'a'
to the value in their column 'Name'
Upvotes: 1
Views: 1455
Reputation: 323226
Try do with concat
and modify your dataframe
df=pd.concat([x.set_index(['Name','Dates']).a.unstack(level=0) for x in listdf])
Or combine then then pivot_table
df=pd.concat([df1,df2]).pivot_table(index='Dates',columns='Name',values='a',aggfunc='first')
Name Apple Banana
Dates
5-5-15 NaN 85.0
6-5-15 42.0 NaN
6-5-16 36.0 18.0
6-5-17 36.0 NaN
6-6-15 NaN 100.0
Upvotes: 2
Reputation: 71689
You can automate the process of merging the dataframe by renaming the columns before merging and then using functools.reduce
to merge all dataframes in df_all
:
from functools import reduce
# rename column a
df_all = [df.rename(columns={'a': df.pop('Name').iloc[0]}) for df in df_all]
# merge all dataframes
merged = reduce(lambda d1, d2: pd.merge(d1, d2, on=['Dates'], how='left') , df_all)
# print(merged)
# sample result after merging df1 & df2
Dates Apple Banana
0 5-5-15 NaN 85.0
1 6-5-15 42.0 NaN
2 6-5-16 36.0 18.0
3 6-5-17 36.0 NaN
Upvotes: 1
Reputation: 1540
Try to change the column name in your first for loop before you drop that column.
for cols in df_all:
name = cols['Name'][0]
cols.drop(['Name'],axis=1,inplace=True)
cols.rename(columns={'a':name},inplace=True)
a = df1.merge(df2, how='left', on = 'Date').merge(df3, how='left', on = 'Date').merge(df4, how='left', on = 'Date')
Upvotes: 2
Reputation: 3215
You can do a rename before you drop the name column. Since all names are the same in a dataframe, you can get it from the first line:
for cols in df_all:
cols.rename(columns={'a': cols.at[0, 'Name']}, inplace=True)
cols = cols.drop(['Name'], axis=1, inplace=True)
Upvotes: 1