Reputation: 2776
Consider the dataframe, tests
, of individual tests in some boreholes:
borehole depthTest
0 B-1 1.5
1 B-1 20.0
2 B-42 1.0
3 B-42 2.0
4 B-42 15.0
5 B-42 30.0
6 B-09 1.0
7 B-09 10.0
8 B-09 15.0
I have another dataframe, liths
, with ranges of lithologies of each borehole:
borehole depthTop lith
0 B-1 0 sand
1 B-1 5 clay
2 B-1 18 shale
3 B-42 0 sand
4 B-42 1 clay
5 B-42 26 shale
6 B-09 0 sand
7 B-09 12 shale
The lithologies are a continuous sequence for each borehole. For example: in B-1 there is sand from a depth of 0 to 5 m, clay from 5 to 18 m, and shale from 18 m onwards. The bottom of each lithology is the top of the next. In other words, the bottom of each lithology would be liths.groupby('borehole').depthTop.shift(-1)
Edit: I want to join the two dfs so I can get the lithology of each test: I want to match on the borehole
and then find the lith
that has the closest depthTop
<= depthTest
.
For example: In B-42 there is clay from a depth of 1 m to 26 m. The test in B-42 at 15.0 m should be classified as clay because 15 is between 1 and 26.
Here's the desired result:
borehole depthTest lith
0 B-1 1.5 sand
1 B-1 20.0 shale
2 B-42 1.0 clay
3 B-42 2.0 clay
4 B-42 15.0 clay
5 B-42 30.0 shale
6 B-09 1.0 sand
7 B-09 10.0 sand
8 B-09 15.0 shale
This seems like a groupby
and merge_asof
problem, but I can't figure out how to get them together.
My solution so far, which works, is to dump this into sqlite3
and then do a between
join (like I did here), but that really seems like defeat.
Upvotes: 0
Views: 73
Reputation: 59274
Find the closest value on df2
and categorize:
def logic(k):
vals = df2.loc[df2.borehole == k.borehole,:]
return vals[vals.depthTop == max(vals[vals.depthTop <= k.depthTest].depthTop)].lith.item()
df.transform(logic, 1)
0 sand
1 shale
2 clay
3 clay
4 clay
5 shale
6 sand
7 sand
8 shale
Upvotes: 1
Reputation: 153500
Let's try this:
tests['lith'] = tests.groupby('borehole')['depthTest'].transform(lambda x: pd.cut(x,
bins = liths.loc[liths.borehole == x.name,'depthTop'].values.tolist() + [np.inf],
labels=liths.loc[liths.borehole == x.name,'lith'], right=False))
Output:
borehole depthTest lith
0 B-1 1.5 sand
1 B-1 20.0 shale
2 B-42 1.0 clay
3 B-42 2.0 clay
4 B-42 15.0 clay
5 B-42 30.0 shale
6 B-09 1.0 sand
7 B-09 10.0 sand
8 B-09 15.0 shale
Let's use pd.cut
to label values in a range.
And by using groupby to get the appropriate bins
and labels
for pd.cut
from liths dataframe by borehole.
Upvotes: 1