marcogemaque
marcogemaque

Reputation: 481

How to call a created funcion with pandas apply to all rows (axis=1) but only to some specific rows of a dataframe?

I have a function which sends automated messages to clients, and takes as input all the columns from a dataframe like the one below.

name phone status date
name_1 phone_1 sending today
name_2 phone_2 sending yesterday

I iterate through the dataframe with a pandas apply (axis=1) and use the values on the columns of each row as inputs to my function. At the end of it, after sending, it changes the status to "sent". The thing is I only want to send to the clients whose date reference is "today". Now, with pandas.apply(axis=1) this is perfectly doable, but in order to slice the clients with "today" value, I need to:

  1. create a new dataframe with today's value,
  2. remove it from the original, and then
  3. reappend it to the original.

I thought about running through the whole dataframe and ignore the rows which have dates different than "today", but if my dataframe keeps growing, I'm afraid of the whole process becoming slower.

I saw examples of this being done with mask, although usually people only use 1 column, and I need more than just the one. Is there any way to do this with pandas apply?

Thank you.

Upvotes: 0

Views: 61

Answers (1)

vamsi_s
vamsi_s

Reputation: 201

I think you can use .loc to filter the data and apply func to it.

In [13]: df = pd.DataFrame(np.random.rand(5,5))

In [14]: df
Out[14]:
          0         1         2         3         4
0  0.085870  0.013683  0.221890  0.533393  0.622122
1  0.191646  0.331533  0.259235  0.847078  0.649680
2  0.334781  0.521263  0.402030  0.973504  0.903314
3  0.189793  0.251130  0.983956  0.536816  0.703726
4  0.902107  0.226398  0.596697  0.489761  0.535270

if we want double the values of rows where the value in first column > 0.3

Out[16]:
          0         1         2         3         4
2  0.334781  0.521263  0.402030  0.973504  0.903314
4  0.902107  0.226398  0.596697  0.489761  0.535270

In [18]: df.loc[df[0] > 0.3] = df.loc[df[0] > 0.3].apply(lambda x: x*2, axis=1)

In [19]: df
Out[19]:
          0         1         2         3         4
0  0.085870  0.013683  0.221890  0.533393  0.622122
1  0.191646  0.331533  0.259235  0.847078  0.649680
2  0.669563  1.042527  0.804061  1.947008  1.806628
3  0.189793  0.251130  0.983956  0.536816  0.703726
4  1.804213  0.452797  1.193394  0.979522  1.070540

Upvotes: 1

Related Questions