DougKruger
DougKruger

Reputation: 4624

Better way of updating pandas dataframe while iterating through it

While iterating through a pandas dataframe and making changes, what is a better way to update the dataframe with the changes? An example below, right now I'm using indexer ix to locate the row to update which I believe is not the best way especially if data is big:

print df

id | A     | B
01 | 374   | 2014-02-01 04:45:04.401502
02 | 284   | 2014-03-12 21:23:12.401502
03 | 183   | 2014-02-01 09:12:08.401502

for row in df.itertuples():
    id = row[1]
    col_a = row[2]
    col_b = row[3]

    N = random.randint(2,5)
    for i in xrange(0, N): 
       new_col_a = col_a + 1
       new_col_b = datetime.datetime.now()

       #update dataframe's A, B respectively
       df.ix[df['id'] == id, ['A', 'B']] = [col_a, col_b]  


print df

id | A     | B
01 | 374   | 2014-02-01 04:45:04.401502
01 | 375   | 2016-12-07 07:45:04.401502
01 | 376   | 2016-12-07 07:45:04.401502
01 | 377   | 2014-12-07 07:45:04.401502
02 | 284   | 2014-03-12 21:23:12.401502
02 | 285   | 2016-12-07 07:45:04.401502
02 | 286   | 2016-12-07 07:45:04.401502
03 | 183   | 2014-02-01 09:12:08.401502
03 | 184   | 2016-12-07 07:45:04.401502
03 | 185   | 2016-12-07 07:45:04.401502
03 | 186   | 2016-12-07 07:45:04.401502

Upvotes: 1

Views: 1021

Answers (1)

jezrael
jezrael

Reputation: 863611

Not very nice solution, because loops.

So first apply for each row of df custom function, where create new DataFrame and append it to list dfs. Then concat it and apply function, which can works out of custom function :

np.random.seed(10)
dfs = []
def expand(x):
    N = np.random.choice([2,3,4])
    df = pd.DataFrame([x.values.tolist()], columns=x.index).reindex(range(N))
    df.A = df.A.fillna(1).cumsum()
    df.insert(1,'prevA', df.A.shift())
    dfs.append(df)

df.apply(expand, axis=1)

df1 = pd.concat(dfs, ignore_index=True)
df1.A = df1.A.astype(int)
df1.id = df1.id.ffill().astype(int)
df1.prevA = df1.prevA.bfill().astype(int)
df1.B = df1.B.fillna(pd.datetime.now())

print (df1)

   id  prevA    A                          B
0   1    374  374 2014-02-01 04:45:04.401502
1   1    374  375 2016-12-07 10:48:14.299336
2   1    375  376 2016-12-07 10:48:14.299336
0   2    284  284 2014-03-12 21:23:12.401502
1   2    284  285 2016-12-07 10:48:14.299336
2   2    285  286 2016-12-07 10:48:14.299336
0   3    183  183 2014-02-01 09:12:08.401502
1   3    183  184 2016-12-07 10:48:14.299336

Upvotes: 1

Related Questions