How to pull data from python pandas df1 to df2 based on values in each row of df2, sort of like a VLOOKUP nested in an HLOOKUP in Excel

Question

Assume I have two dataframes, df1 and df2, described as follows. See code below that creates each of these dfs.

df1

Has 5,000 rows and 10,000 columns.
The first column contains a list of non-sequential dates. The dates are listed oldest to newest, but not every day is listed (i.e., only some days are listed). Each date is unique.
Each column is labeled with a different person's name. Each column name is unique.
All columns other than the Date column contain a number value.

df2

Has 2,000,000 rows and 4 columns.
The first column contains a list of dates. These are NOT sorted by oldest to newest.
The next column contains a person's name (which is listed as the column name in one of the columns of df1).
The other two columns contain data about that person based on the date listed in the row.

My Objective

I want to populate the two blank columns in df2 using data pulled from df1.
For instance, the first row of df2 lists a date of 2017-05-15 and a Person named Person4. I want to populate df2['Value_Today'] with 4752. I want to populate df2['Value_2_records_later'] with 4866.
For the next row of df2 (with Date of 2019-01-28 and Person named Person1, I want to populate df2['Value_Today'] with 1918. I want to populate df2['Value_2_records_later'] with 1912.
I want to do this for all 2 million rows in df2, so I assume that a for loop is a bad idea.

Any help would be greatly appreciated. Thank you!

Code

# Import dependencies
import pandas as pd
import numpy as np

# Create df1 
df1 = pd.DataFrame(np.array([['2016-05-03', 1651,2653,3655,4658,5655], 
                             ['2017-05-29', 1751,2752,3754,4755, 5759], 
                             ['2018-08-22', 1889, 2882,3887, 4884, 5882], 
                             ['2019-06-28', 1966, 2965, 3966, 4960, 5963],
                             ['2018-11-15', 1811, 2811, 3811, 4811, 5811], 
                             ['2019-12-31', 1912, 2912, 3912, 4912, 5912],
                             ['2016-07-05', 1672, 2678, 3679, 4672, 5674], 
                             ['2017-05-15', 1755, 2750, 3759, 4752, 5755], 
                             ['2018-06-10', 1860, 2864, 3866, 4866, 5867], 
                             ['2019-01-28', 1918, 2910, 3914, 4911, 5918],
                             ['2018-11-30', 1812, 2812, 3812, 4812, 5812], 
                             ['2019-01-03', 1915, 2917, 3916, 4916, 5917],]),
                   columns=['Date', 'Person1', 'Person2', 'Person3', 'Person4', 
                            'Person5',])
# Format df1['Date'] col as datetime
df1['Date'] = pd.to_datetime(df1['Date'])
# Sort df1 by 'Date'
df1 = df1.sort_values(['Date'],ascending=[True]).reset_index(drop=True)

# Create 'df2', which contains measurement data on specific dates.
df2 = pd.DataFrame(np.array([['2017-05-15', 'Person4', '', ''], ['2019-01-28    ', 'Person1', '', ''], 
                              ['2018-11-15', 'Person1', '', ''], ['2018-08-22', 'Person3', '', ''],
                              ['2017-05-15', 'Person5', '', ''], ['2016-05-03', 'Person2', '', ''],]),
                   columns=['Date', 'Person', 'Value_Today', 'Value_2_records_later'])
df2['Date'] = pd.to_datetime(df2['Date'])

# Display dfs
display(df1)
display(df2)

### I DON'T KNOW WHAT CODE I NEED TO SOLVE MY ISSUE ###

# To capture the row that is two rows below, I think I would use the '.shift(-2)' function?

Shubham Sharma · Accepted Answer

Solution with `MultiIndex.map`:

Set the index of df1 to Date
Stack the dataframe to create multiindex mapping series s1. The index of this series will be the combination of date and name of the person. Similarly create another series s2.
Set the index of df2 to Date and Person columns
Substitute the values in the index of df2 using the values from s1 and s2 and assign the corresponding results to Value_Today and Value_2_records_later

s1 = df1.set_index('Date').stack()
s2 = df1.set_index('Date').shift(-2).stack()
ix = df2.set_index(['Date', 'Person']).index

df2['Value_Today'] = ix.map(s1)
df2['Value_2_records_later'] = ix.map(s2)

Result

print(df2)

        Date   Person Value_Today Value_2_records_later
0 2017-05-15  Person4        4752                  4866
1 2019-01-28  Person1        1918                  1912
2 2018-11-15  Person1        1811                  1915
3 2018-08-22  Person3        3887                  3812
4 2017-05-15  Person5        5755                  5867
5 2016-05-03  Person2        2653                  2750

How to pull data from python pandas df1 to df2 based on values in each row of df2, sort of like a VLOOKUP nested in an HLOOKUP in Excel

Answers (2)

Solution with `MultiIndex.map`:

Related Questions

How to pull data from python pandas df1 to df2 based on values in each row of df2, sort of like a VLOOKUP nested in an HLOOKUP in Excel

Answers (2)

Solution with MultiIndex.map:

Related Questions

Solution with `MultiIndex.map`: