Reputation: 687
i have dataset with two columns:
import pandas as pd
dict = {'val':["3.2", "2.4", "-2.3", "-4.9"],
'conf_interval': ["[-0.83, -1.78]", "[0.71, 2.78]", "[-0.91, -2.28]", "[-0.69, -2.14]"]}
df = pd.DataFrame(dict)
df
val conf_interval
0 3.2 [-0.83, -1.78]
1 2.4 [0.71, 2.78]
2 -2.3 [-0.91, -2.28]
3 -4.9 [-0.69, -2.14]
I want to check which of the values in column val
is between two values in column conf_interval
. Is the only way is to splitconf_interval
column to two columns? Or there are also other way without splitting this column?
The desirede output is something like this:
val conf_interval result
0 3.2 [-1.78, -0.83] False
1 2.4 [0.71, 2.78] True
2 -2.3 [-2.28, -0.91] False
3 -4.9 [0.69, 2.14] False
Upvotes: 0
Views: 552
Reputation: 88236
I've used the intervals from the expected output's dataframe, where the left hand side is lower than the right hand. Here's one approach using pd.IntervalIndex
:
from ast import literal_eval
df['conf_interval'] = df.conf_interval.map(literal_eval)
df['val'] = pd.to_numeric(df.val)
intervals = pd.IntervalIndex.from_tuples(list(map(tuple, df.conf_interval)))
df['result'] = intervals.contains(df.val)
print(df)
val conf_interval result
0 3.2 [-1.78, -0.83] False
1 2.4 [0.71, 2.78] True
2 -2.3 [-2.28, -0.91] False
3 -4.9 [0.69, 2.14] False
Upvotes: 2
Reputation: 862771
Use Series.between
with converted column conf_interval
float series by Series.str.split
:
df1 = df['conf_interval'].str.strip('[]').str.split(', ', expand=True).astype(float)
df['result'] = df['val'].astype(float).between(df1[0], df1[1])
print (df)
val conf_interval result
0 3.2 [-0.83, -1.78] False
1 2.4 [0.71, 2.78] True
2 -2.3 [-0.91, -2.28] False
3 -4.9 [-0.69, -2.14] False
Upvotes: 2