cfort
cfort

Reputation: 2776

Find range that point falls into?

Consider the dataframe, tests, of individual tests in some boreholes:

  borehole  depthTest
0      B-1        1.5
1      B-1       20.0
2     B-42        1.0
3     B-42        2.0
4     B-42       15.0
5     B-42       30.0
6     B-09        1.0
7     B-09       10.0
8     B-09       15.0

I have another dataframe, liths, with ranges of lithologies of each borehole:

  borehole  depthTop   lith
0      B-1         0   sand
1      B-1         5   clay
2      B-1        18  shale
3     B-42         0   sand
4     B-42         1   clay
5     B-42        26  shale
6     B-09         0   sand
7     B-09        12  shale

The lithologies are a continuous sequence for each borehole. For example: in B-1 there is sand from a depth of 0 to 5 m, clay from 5 to 18 m, and shale from 18 m onwards. The bottom of each lithology is the top of the next. In other words, the bottom of each lithology would be liths.groupby('borehole').depthTop.shift(-1)

Edit: I want to join the two dfs so I can get the lithology of each test: I want to match on the borehole and then find the lith that has the closest depthTop <= depthTest.

For example: In B-42 there is clay from a depth of 1 m to 26 m. The test in B-42 at 15.0 m should be classified as clay because 15 is between 1 and 26.

Here's the desired result:

  borehole  depthTest   lith
0      B-1        1.5   sand
1      B-1       20.0  shale
2     B-42        1.0   clay
3     B-42        2.0   clay
4     B-42       15.0   clay
5     B-42       30.0  shale
6     B-09        1.0   sand
7     B-09       10.0   sand
8     B-09       15.0  shale

This seems like a groupby and merge_asof problem, but I can't figure out how to get them together.

My solution so far, which works, is to dump this into sqlite3 and then do a between join (like I did here), but that really seems like defeat.

Upvotes: 0

Views: 73

Answers (2)

rafaelc
rafaelc

Reputation: 59274

Find the closest value on df2 and categorize:

def logic(k):
    vals = df2.loc[df2.borehole == k.borehole,:]
    return vals[vals.depthTop == max(vals[vals.depthTop <= k.depthTest].depthTop)].lith.item()

df.transform(logic, 1)


0     sand
1    shale
2     clay
3     clay
4     clay
5    shale
6     sand
7     sand
8    shale

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153500

Let's try this:

tests['lith'] = tests.groupby('borehole')['depthTest'].transform(lambda x: pd.cut(x,
       bins = liths.loc[liths.borehole == x.name,'depthTop'].values.tolist() + [np.inf],
       labels=liths.loc[liths.borehole == x.name,'lith'], right=False))

Output:

  borehole  depthTest   lith
0      B-1        1.5   sand
1      B-1       20.0  shale
2     B-42        1.0   clay
3     B-42        2.0   clay
4     B-42       15.0   clay
5     B-42       30.0  shale
6     B-09        1.0   sand
7     B-09       10.0   sand
8     B-09       15.0  shale

Let's use pd.cut to label values in a range.
And by using groupby to get the appropriate bins and labels for pd.cut from liths dataframe by borehole.

Upvotes: 1

Related Questions