Reputation: 2314
I have two data frames as follow:
A = pd.DataFrame({"value":[3, 7, 5 ,18,23,27,21,29]})
B = pd.DataFrame({"low":[1, 6, 11 ,16,21,26], "high":[5,10,15,20,25,30], "name":["one","two","three","four","five", "six"]})
I want to find whether "value" in A is between 'high' and low' in B, and if so, I want to copy the column name from B to A.
The output should look like this:
A = pd.DataFrame({"value":[3, 7, 5 ,18,23,27,21,29], "name":["one","two","one","four","five", "six", "five", "six"]})
My function uses iterrows as follows:
def func1(row):
x = row['value']
for index,value in B.iterrows():
if ((value['low'] <= x) &(x<=value['high'])):
return value['name']
But it doesn't yet achieve what i want to do,
thank you,
Upvotes: 3
Views: 2745
Reputation: 109546
You can use a list comprehension to iterate through the values in A
, and then use loc
to get the relevant mapped values. le
is less than or equal to, and ge
is greater than or equal to.
For example, v = 3
in the first row. Using simple boolean indexing:
>>> B[(B['low'].le(v)) & (B['high'].ge(v))]
high low name
0 5 1 one
Assuming that DataFrame B
does not have any overlapping ranges, then you will get back one row as above. One then uses loc
to get the name
column, as below. Because each returned name is a a series, you need get the first and only scalar value (using iat
, for example).
A['name'] = [B.loc[(B['low'].le(v)) & (B['high'].ge(v)), 'name'].iat[0]
for v in A['value']]
>>> A
value name
0 3 one
1 7 two
2 5 one
3 18 four
4 23 five
5 27 six
6 21 five
7 29 six
Upvotes: 4
Reputation: 95948
I believe you are looking for something like this:
In [1]: import pandas as pd
In [2]: A = pd.DataFrame({"value":[3, 7, 5 ,18,23,27,21,29]})
In [3]:
In [3]: B = pd.DataFrame({"low":[1, 6, 11 ,16,21,26], "high":[5,10,15,20,25,30], "name":["one","two","three","four","five", "six"]})
In [4]: A
Out[4]:
value
0 3
1 7
2 5
3 18
4 23
5 27
6 21
7 29
In [5]: B
Out[5]:
high low name
0 5 1 one
1 10 6 two
2 15 11 three
3 20 16 four
4 25 21 five
5 30 26 six
In [6]: def func1(x):
...: for row in B.itertuples():
...: if row.low <= x <= row.high:
...: return row.name
...:
In [7]: A.value.map(func1)
Out[7]:
0 one
1 two
2 one
3 four
4 five
5 six
6 five
7 six
Name: value, dtype: object
In [8]: A['name'] = A['value'].map(func1)
In [9]: A
Out[9]:
value name
0 3 one
1 7 two
2 5 one
3 18 four
4 23 five
5 27 six
6 21 five
7 29 six
I use itertuples
because it should be a little bit faster but in general this will not be very efficient. This is a solution but there might be better ones.
In [8]: timeit A['value'].map(func1)
100 loops, best of 3: 10.5 ms per loop
In [9]: timeit [B.loc[(B['low'].le(v)) & (B['high'].ge(v)), 'name'].tolist()[0] for v in A['value']]
100 loops, best of 3: 9.06 ms per loop
Quick and dirty test shows that Alexander's approach is faster. I wonder how it scales.
Upvotes: 1