practicalGuy
practicalGuy

Reputation: 1328

optimized way of iterating through dataframe

I have a pandas dataframe, called Visits2 contains 20M records. Here are sample of records from Visits2.

num         srv_edt     inpt_flag
000423733A  8/15/2016   N
001013135D  7/11/2016   N
001013135D  7/11/2016   N
001047851M  4/29/2016   N
001067291M  2/29/2016   Y
001067291M  8/3/2016    N
001067291M  8/3/2016    N
001067291M  9/4/2016    N
001070817A  5/25/2016   N
001070817A  5/25/2016   Y
001072424A  1/13/2016   N
001072424A  2/17/2016   Y
001072424A  3/21/2016   N
001072424A  3/21/2016   N
001072424A  5/10/2016   N
001072424A  6/6/2016    N

I'm executing below code, Assign inpt_any with N, when srv_edt is first occurrence in the group of num. if the inpt_flag already has the value as Y then assign inpt_flag with Y.

This is running fine, But consider at 20M volume, it is taking hours to run. Somebody, please suggest me optimize way of looping through the dataframe.

prev_srv_edt = " "
for vv in Visits2.itertuples():
    inpt_any = 'N'
    if (prev_srv_edt != vv[1]):
        prev_srv_edt = vv[1]
        Visits2.loc[vv[0],'inpt_any'] = 'N'
    if (vv[2] == 'Y'):
        Visits2.loc[vv[0],'inpt_any'] = 'Y'

I did try with list(zip(visit['srv_edt'],visit['inpt_flag'])), but I see zip also taking lot of time to run.

Upvotes: 0

Views: 48

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

IIUC you can do it this way:

In [37]: df.loc[df.groupby('num')['srv_edt'].idxmin(), 'inpt_any'] = 'N'

In [38]: df
Out[38]:
           num    srv_edt inpt_flag inpt_any
0   000423733A 2016-08-15         N        N
1   001013135D 2016-07-11         N        N
2   001013135D 2016-07-11         N      NaN
3   001047851M 2016-04-29         N        N
4   001067291M 2016-02-29         Y        N
5   001067291M 2016-08-03         N      NaN
6   001067291M 2016-08-03         N      NaN
7   001067291M 2016-09-04         N      NaN
8   001070817A 2016-05-25         N        N
9   001070817A 2016-05-25         Y      NaN
10  001072424A 2016-01-13         N        N
11  001072424A 2016-02-17         Y      NaN
12  001072424A 2016-03-21         N      NaN
13  001072424A 2016-03-21         N      NaN
14  001072424A 2016-05-10         N      NaN
15  001072424A 2016-06-06         N      NaN

Upvotes: 1

Related Questions