BGG16
BGG16

Reputation: 536

How to pull data from python pandas df1 to df2 based on values in each row of df2, sort of like a VLOOKUP nested in an HLOOKUP in Excel

Assume I have two dataframes, df1 and df2, described as follows. See code below that creates each of these dfs.

df1

df2

My Objective

Any help would be greatly appreciated. Thank you!

Code

# Import dependencies
import pandas as pd
import numpy as np

# Create df1 
df1 = pd.DataFrame(np.array([['2016-05-03', 1651,2653,3655,4658,5655], 
                             ['2017-05-29', 1751,2752,3754,4755, 5759], 
                             ['2018-08-22', 1889, 2882,3887, 4884, 5882], 
                             ['2019-06-28', 1966, 2965, 3966, 4960, 5963],
                             ['2018-11-15', 1811, 2811, 3811, 4811, 5811], 
                             ['2019-12-31', 1912, 2912, 3912, 4912, 5912],
                             ['2016-07-05', 1672, 2678, 3679, 4672, 5674], 
                             ['2017-05-15', 1755, 2750, 3759, 4752, 5755], 
                             ['2018-06-10', 1860, 2864, 3866, 4866, 5867], 
                             ['2019-01-28', 1918, 2910, 3914, 4911, 5918],
                             ['2018-11-30', 1812, 2812, 3812, 4812, 5812], 
                             ['2019-01-03', 1915, 2917, 3916, 4916, 5917],]),
                   columns=['Date', 'Person1', 'Person2', 'Person3', 'Person4', 
                            'Person5',])
# Format df1['Date'] col as datetime
df1['Date'] = pd.to_datetime(df1['Date'])
# Sort df1 by 'Date'
df1 = df1.sort_values(['Date'],ascending=[True]).reset_index(drop=True)

# Create 'df2', which contains measurement data on specific dates.
df2 = pd.DataFrame(np.array([['2017-05-15', 'Person4', '', ''], ['2019-01-28    ', 'Person1', '', ''], 
                              ['2018-11-15', 'Person1', '', ''], ['2018-08-22', 'Person3', '', ''],
                              ['2017-05-15', 'Person5', '', ''], ['2016-05-03', 'Person2', '', ''],]),
                   columns=['Date', 'Person', 'Value_Today', 'Value_2_records_later'])
df2['Date'] = pd.to_datetime(df2['Date'])

# Display dfs
display(df1)
display(df2)

### I DON'T KNOW WHAT CODE I NEED TO SOLVE MY ISSUE ###

# To capture the row that is two rows below, I think I would use the '.shift(-2)' function?

Upvotes: 1

Views: 314

Answers (2)

Raymond Kwok
Raymond Kwok

Reputation: 2541

First, copy the values once for Value_2_records_later,

step1 = df1.set_index('Date')
persons = step1.columns.tolist()

c1 = [('Value_Today', p) for p in persons]
c2 = [('Value_2_records_later', p) for p in persons]

step1.columns = pd.MultiIndex.from_tuples(c1, names=('','Person'))
step1[c2] = step1[c1].shift(-2)

Then stack to move columns to rows

step1.stack()

Upvotes: 0

Shubham Sharma
Shubham Sharma

Reputation: 71687

Solution with MultiIndex.map:

  • Set the index of df1 to Date
  • Stack the dataframe to create multiindex mapping series s1. The index of this series will be the combination of date and name of the person. Similarly create another series s2.
  • Set the index of df2 to Date and Person columns
  • Substitute the values in the index of df2 using the values from s1 and s2 and assign the corresponding results to Value_Today and Value_2_records_later
s1 = df1.set_index('Date').stack()
s2 = df1.set_index('Date').shift(-2).stack()
ix = df2.set_index(['Date', 'Person']).index

df2['Value_Today'] = ix.map(s1)
df2['Value_2_records_later'] = ix.map(s2)

Result

print(df2)

        Date   Person Value_Today Value_2_records_later
0 2017-05-15  Person4        4752                  4866
1 2019-01-28  Person1        1918                  1912
2 2018-11-15  Person1        1811                  1915
3 2018-08-22  Person3        3887                  3812
4 2017-05-15  Person5        5755                  5867
5 2016-05-03  Person2        2653                  2750

Upvotes: 1

Related Questions