obabs
obabs

Reputation: 9171

Create columns values based on columns names from another two seperate dataframe in Pandas Python

I have two Pandas Dataframe and I want to be able to create the Result column (yellow) by matching column1 values in first df with the appropriate values from the second df. Column1 values in df one are all references to columns names in the second df.

enter image description here

enter image description here

Upvotes: 0

Views: 55

Answers (2)

pansen
pansen

Reputation: 6663

As mentioned in the comments, melt and merge are a good approach here:

import pandas as pd

df1 = pd.DataFrame({'Date': ['12/30/2016', '12/30/2016', '1/31/2017', '1/31/2017'], 
                    'Column1': ['APB', 'UPB', 'APB', 'UPB']})

df2 = pd.DataFrame({'Date': ['12/30/2016', '1/31/2017', '2/28/2017', '3/31/2017'], 
                   'APB': [117, 112.8, 112.37, 112.23], 
                   'UPB': [67.925, 67.865, 66.717, 66.939]})

melted = pd.melt(df2, id_vars="Date", var_name="Column1", value_name="Result")
merged = df1.merge(melted, on=["Date", "Column1"])

print(merged)

  Column1        Date   Result
0     APB  12/30/2016  117.000
1     UPB  12/30/2016   67.925
2     APB   1/31/2017  112.800
3     UPB   1/31/2017   67.865

Upvotes: 1

Vaishali
Vaishali

Reputation: 38415

You can use melt and set_index on your df2

df1 = pd.DataFrame({'Date': ['12/30/2016', '12/30/2016', '1/31/2017', '1/31/2017'], 'col1': ['APB', 'UPB', 'APB', 'UPB']})
df2 = pd.DataFrame({'Date': ['12/30/2016', '1/31/2017', '2/28/2017', '3/31/2017'], 'APB': [117, 112.8, 112.37, 112.23], 'UPB': [67.9, 67.8, 66.7, 66.9]})



df2 = pd.melt(df2, id_vars='Date', value_vars=['APB', 'UPB'])
df2['Date'] = pd.to_datetime(df2['Date'])
df2.sort_values(by = 'Date').set_index('Date')

This gives you

    variable    value
Date        
2016-12-30  APB 117.00
2016-12-30  UPB 67.90
2017-01-31  APB 112.80
2017-01-31  UPB 67.80
2017-02-28  APB 112.37

Now you can merge the two dataframes,

df1 = df1.merge(df2, left_on = 'col1', right_on = 'variable').drop_duplicates().drop('variable', axis = 1).sort_values(by = 'Date')

That gives you

    col1    Date    value
0   APB 2016-12-30  117.00
8   UPB 2016-12-30  67.90
1   APB 2017-01-31  112.80
9   UPB 2017-01-31  67.80
2   APB 2017-02-28  112.37
10  UPB 2017-02-28  66.70
3   APB 2017-03-31  112.23
11  UPB 2017-03-31  66.90

Upvotes: 1

Related Questions