Reputation: 481
I have a function which sends automated messages to clients, and takes as input all the columns from a dataframe like the one below.
name | phone | status | date |
---|---|---|---|
name_1 | phone_1 | sending | today |
name_2 | phone_2 | sending | yesterday |
I iterate through the dataframe with a pandas apply (axis=1) and use the values on the columns of each row as inputs to my function. At the end of it, after sending, it changes the status to "sent". The thing is I only want to send to the clients whose date reference is "today". Now, with pandas.apply(axis=1) this is perfectly doable, but in order to slice the clients with "today" value, I need to:
I thought about running through the whole dataframe and ignore the rows which have dates different than "today", but if my dataframe keeps growing, I'm afraid of the whole process becoming slower.
I saw examples of this being done with mask, although usually people only use 1 column, and I need more than just the one. Is there any way to do this with pandas apply?
Thank you.
Upvotes: 0
Views: 61
Reputation: 201
I think you can use .loc to filter the data and apply func to it.
In [13]: df = pd.DataFrame(np.random.rand(5,5))
In [14]: df
Out[14]:
0 1 2 3 4
0 0.085870 0.013683 0.221890 0.533393 0.622122
1 0.191646 0.331533 0.259235 0.847078 0.649680
2 0.334781 0.521263 0.402030 0.973504 0.903314
3 0.189793 0.251130 0.983956 0.536816 0.703726
4 0.902107 0.226398 0.596697 0.489761 0.535270
if we want double the values of rows where the value in first column > 0.3
Out[16]:
0 1 2 3 4
2 0.334781 0.521263 0.402030 0.973504 0.903314
4 0.902107 0.226398 0.596697 0.489761 0.535270
In [18]: df.loc[df[0] > 0.3] = df.loc[df[0] > 0.3].apply(lambda x: x*2, axis=1)
In [19]: df
Out[19]:
0 1 2 3 4
0 0.085870 0.013683 0.221890 0.533393 0.622122
1 0.191646 0.331533 0.259235 0.847078 0.649680
2 0.669563 1.042527 0.804061 1.947008 1.806628
3 0.189793 0.251130 0.983956 0.536816 0.703726
4 1.804213 0.452797 1.193394 0.979522 1.070540
Upvotes: 1