Reputation: 1960
I have a dataframe with 2 columns - a tuple and a list:
df = t l
(1,2) [1,2,3,4,5,6]
(0,5) [1,4,9]
(0,4) [9,11]
I want to add a new column of "how many elements from l are in the range of t. So for example, here if will be:
df =counter t l
2 (1,2) [1,2,3,4,5,6]
2 (0,5) [1,4,9]
0 (0,4) [9,11]
What is the best way to do so?
Upvotes: 0
Views: 431
Reputation: 862581
Use list comprehension with generator and sum
:
df['counter'] = [sum(a <= i <= b for i in y) for (a, b), y in df[['t','l']].to_numpy()]
A bit faster solution with set.intersection
is:
df['counter'] = [len(set(range(a, b+1)).intersection(y))
for (a, b), y in df[['t','l']].to_numpy()]
print (df)
t l counter
0 (1, 2) [1, 2, 3, 4, 5, 6] 2
1 (0, 5) [1, 4, 9] 2
2 (0, 4) [9, 11] 0
Performance in test data:
#30k rows
df = pd.concat([df] * 10000, ignore_index=True)
In [67]: %timeit [sum(a <= i <= b for i in y) for (a, b), y in df[['t','l']].to_numpy()]
65.3 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [68]: %timeit [len(set(range(a, b+1)).intersection(y)) for (a, b), y in df[['t','l']].to_numpy()]
60.7 ms ± 520 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
Upvotes: 2