oceanbeach96
oceanbeach96

Reputation: 634

Merging two DataFrames using column names of one of the DataFrames

Task Description

I have a bunch of DataFrames I read in from sheets in an xlsx file. Is it possible to merge df into df2 and create a column in df2 which is the sheet name stored in the loop.

I read the df in as follows:

for idx, sheet_name in enumerate(excel_file.sheet_names):
    df = excel_file.parse(sheet_name)

    # do some stuff

I wish to merge df with df2 within this loop.

If we just do it for an individual sheet, for example, revenue, is it possible to create df3?

df

 Date            AR        AU      GB       US
1983-03-31     0.001   0.053206  0.001  0.160159
1983-04-30     0.001   0.053206  0.001  0.160159
1983-05-31     0.001   0.053206  0.001  0.160159

df2

 Date.        a   ...  z   loc     
1983-03-31                  AR
1983-03-31                  AU
1983-03-31                  GB
1983-03-31                  US
1983-04-30                  AR
1983-04-30                  AU
1983-04-30                  GB
1983-04-30                  US
1983-05-31                  AR
1983-05-31                  AU
1983-05-31                  GB 
1983-05-31                  US

Desired DataFrame df3

 Date.        a   ...  z   loc   revenue 
1983-03-31                  AR   0.001
1983-03-31                  AU   0.053206
1983-03-31                  GB   0.001
1983-03-31                  US   0.160159
1983-04-30                  AR   0.001
1983-04-30                  AU   0.053206
1983-04-30                  GB   0.001
1983-04-30                  US   0.160159
1983-05-31                  AR   0.001
1983-05-31                  AU   0.053206
1983-05-31                  GB   0.001
1983-05-31                  US   0.160159

Upvotes: 2

Views: 48

Answers (2)

wwnde
wwnde

Reputation: 26686

Best way out, melt df and merge with df.

 df3= print(df2.merge(pd.melt(df, id_vars=['Date'], \
                  var_name='loc', value_name='Revenue'),how='left', on=['Date','loc']))

print(df3)

     Date loc   Revenue
0   1983-03-31  AR  0.001000
1   1983-03-31  AU  0.053206
2   1983-03-31  GB  0.001000
3   1983-03-31  US  0.160159
4   1983-04-30  AR  0.001000
5   1983-04-30  AU  0.053206
6   1983-04-30  GB  0.001000
7   1983-04-30  US  0.160159
8   1983-05-31  AR  0.001000
9   1983-05-31  AU  0.053206
10  1983-05-31  GB  0.001000
11  1983-05-31  US  0.160159

Upvotes: 1

ansev
ansev

Reputation: 30940

IIUC, DataFrame.lookup

df3 = df2.copy()
df3['revenue'] = df.set_index('Date').lookup(df2['Date'], df2['loc'])
print(df3)

If there are missing value in df for any 'Date' , 'loc' in df2:

df3 = (df.melt('Date', var_name = 'loc', value_name='revenue')
         .merge(df2, on=['Date' , 'loc'], how='right'))


print(df3)

Output

          Date loc   revenue
0   1983-03-31  AR  0.001000
1   1983-03-31  AU  0.053206
2   1983-03-31  GB  0.001000
3   1983-03-31  US  0.160159
4   1983-04-30  AR  0.001000
5   1983-04-30  AU  0.053206
6   1983-04-30  GB  0.001000
7   1983-04-30  US  0.160159
8   1983-05-31  AR  0.001000
9   1983-05-31  AU  0.053206
10  1983-05-31  GB  0.001000
11  1983-05-31  US  0.160159

Upvotes: 2

Related Questions