user026
user026

Reputation: 702

How to combine/merge dataframes by approximate values of a column?

This is an example of a bigger data. Imagine I have two dataframes like these:

import pandas as pd
import numpy as np

np.random.seed(42)
df1 = pd.DataFrame({'Depth':np.arange(0.5, 4.5, 0.5),
                    'Feat1':np.random.randint(20, 70, 8)})

df2 = pd.DataFrame({'Depth':[0.4, 1.1, 1.5, 2.2, 2.8],
                    'Rock':['Sand','Sand','Clay','Clay','Marl']})

They have different size and I would like to put the information of 'Rock' column from df2 on df1 as a new column. This combination should be done based on the 'Depth' columns from these two dataframes, but they have different sampling rates. Df1 follows a constant step of 0.5, but the thickness of df2 is different.

So I would like to merge these information based on approximate values of 'Depth'. For example: if sample of df2 has 'Depth' of 2.2, then look at the most near 'Depth' value of df1, that should be 2.0, and add 'Rock' information ('Clay') on that sample. And it is important to say that 'Rock' values can be repeated on the new column to avoid missing data just inside this segmentation. Anyone could help me?

I already tried some pandas methods as 'merge' and 'combine_first', but I couldn't get the result I wanted. It should be something like this:

enter image description here

Upvotes: 3

Views: 906

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35686

Use merge_asof:

df3 = pd.merge_asof(df1, df2, on='Depth', tolerance=0.5, direction='nearest')

df3:

   Depth  Feat1  Rock
0    0.5     58  Sand
1    1.0     48  Sand
2    1.5     34  Clay
3    2.0     62  Clay
4    2.5     27  Clay
5    3.0     40  Marl
6    3.5     58   NaN
7    4.0     38   NaN

Complete Working Example:

import numpy as np
import pandas as pd

np.random.seed(42)
df1 = pd.DataFrame({
    'Depth': np.arange(0.5, 4.5, 0.5),
    'Feat1': np.random.randint(20, 70, 8)
})

df2 = pd.DataFrame({
    'Depth': [0.4, 1.1, 1.5, 2.2, 2.8],
    'Rock': ['Sand', 'Sand', 'Clay', 'Clay', 'Marl']
})

df3 = pd.merge_asof(df1, df2, on='Depth', tolerance=0.5, direction='nearest')

print(df3)

Upvotes: 7

Related Questions