pst
pst

Reputation: 92

Merge on specific column with multiple conditions

I have the following two DataFrames:

  df           Date1 Currency Currency_code
    0  2021-01-01      EUR         code1
    1  2021-01-02      GBP         code2
    2  2021-01-04      USD         code2



  df1          Date1  code1_eur  code1_gbp  code2_eur  code2_gbp  code2_usd
    0  2021-01-01       0.10       0.20       0.30       0.40       0.50
    1  2021-01-02       0.10       0.20       0.30       0.40       0.50
    2  2021-01-03       0.60       0.70       0.80       0.90       0.90
    3  2021-01-04       0.11       0.21       0.31       0.41       0.51

And I am wondering how to get something like this :

        Date1 Currency Currency_code  Discount
0  2021-01-01      EUR         code1      0.10
1  2021-01-02      GBP         code2      0.40
2  2021-01-04      USD         code2      0.51

So more or less left merge on Date1 column but then also only getting the value from df1, corresponding to Currency_code and Currency columns

Upvotes: 3

Views: 49

Answers (2)

Andreas
Andreas

Reputation: 9207

You can try it like this:

df1 = df1.melt('Date1', var_name='i')
df.assign(i=df['Currency_code']+'_'+df['Currency'].str.lower()).merge(df1, on=['Date1', 'i']).drop('i', axis=1)

        Date1 Currency Currency_code  value
0  2021-01-01      EUR         code1   0.10
1  2021-01-02      GBP         code2   0.40
2  2021-01-04      USD         code2   0.51

Upvotes: 1

jezrael
jezrael

Reputation: 863291

First is necessary change format of df1 by unpivot with splitting columns names with _ and reshape by DataFrame.stack with convert Currency to uppercase:

#if encessary convert columns to datetimes
df['Date'] = pd.to_datetime(df['Date'])
df1['Date'] = pd.to_datetime(df1['Date'])

df11 = df1.set_index('Date1')
df11.columns = df11.columns.str.split('_', expand=True)

df2 = (df11.stack([0,1])
           .rename_axis(['Date1','Currency_code','Currency'])
           .reset_index(name='Discount'))
df2['Currency'] = df2['Currency'].str.upper()
print (df2)
         Date1 Currency_code Currency  Discount
0   2021-01-01         code1      EUR      0.10
1   2021-01-01         code1      GBP      0.20
2   2021-01-01         code2      EUR      0.30
3   2021-01-01         code2      GBP      0.40
4   2021-01-01         code2      USD      0.50
5   2021-01-02         code1      EUR      0.10
6   2021-01-02         code1      GBP      0.20
7   2021-01-02         code2      EUR      0.30
8   2021-01-02         code2      GBP      0.40
9   2021-01-02         code2      USD      0.50
10  2021-01-03         code1      EUR      0.60
11  2021-01-03         code1      GBP      0.70
12  2021-01-03         code2      EUR      0.80
13  2021-01-03         code2      GBP      0.90
14  2021-01-03         code2      USD      0.90
15  2021-01-04         code1      EUR      0.11
16  2021-01-04         code1      GBP      0.21
17  2021-01-04         code2      EUR      0.31
18  2021-01-04         code2      GBP      0.41
19  2021-01-04         code2      USD      0.51

Or unpivot by DataFrame.melt (this solution is slowier in large df, because processing large column, in previous is processing few values in columns names):

df2 = df1.melt('Date1', value_name='Discount')
df2[['Currency_code','Currency']] = df2.pop('variable').str.split('_', expand=True)
df2['Currency'] = df2['Currency'].str.upper()

And then join to first DataFrame by left join with DataFrame.merge:

df = df.merge(df2, how='left', on=['Date1','Currency_code','Currency'])

Upvotes: 3

Related Questions