Reputation: 17621
I have a pandas DataFrame whereby I'm using pandas.DataFrame.apply()
.
dict1 = {'file': ['filename2', 'filename2', 'filename3', 'filename4',
'filename4', 'filename3'], 'amount': [3, 4, 5, 1, 2, 1],
'front': [21889611, 36357723, 196312, 11, 42, 1992],
'back':[21973805, 36403870, 277500, 19, 120, 3210],
'type':['A', 'A', 'A', 'B', 'B', 'C']}
df1 = pd.DataFrame(dict1)
print(df1)
file amount front back type
0 filename2 3 21889611 21973805 A
1 filename2 4 36357723 36403870 A
2 filename3 5 196312 277500 A
3 filename4 1 11 19 B
4 filename4 2 42 120 B
5 filename3 1 1992 3210 C
My task is to take N
random draws between columns front
and back
, whereby N
is equal to the value in column amount
:
def my_func(x):
return np.random.choice(np.arange(x.front, x.back+1), x.amount).tolist()
I would only like to apply this function on rows whereby type
is equal to A
.
If I didn't have the conditional, I would use pd.DataFrame.apply()
as follows:
df1["new_column"] = df1.apply(my_func, axis=1)
To apply this only if type==A
, I could do something like
if set(['A']).issubset(df1['type']):
df1["new_column"] = df1.apply(my_func, axis=1)
However, this doesn't actually work---my_func
is applied to all functions.
(1) Why doesn't the above work as expected?
(2) How would one efficiently only apply my_func
to the rows which satisfy type==A
? In the example above, this would be for the first three rows, and the next three rows would be NA
Upvotes: 2
Views: 729
Reputation: 153460
Filter your dataframe first then apply my_func
. Let's use query
:
df1['new_column'] = df1.query('type == "A"').apply(my_func, axis=1)
Output:
amount back file front type \
0 3 21973805 filename2 21889611 A
1 4 36403870 filename2 36357723 A
2 5 277500 filename3 196312 A
3 1 19 filename4 11 B
4 2 120 filename4 42 B
5 1 3210 filename3 1992 C
new_column
0 [21921030, 21908574, 21971743]
1 [36391053, 36371413, 36394390, 36376405]
2 [198648, 263355, 197017, 261666, 260815]
3 NaN
4 NaN
5 NaN
Upvotes: 2