Can't modify Pandas DataFrame while iterating

Question

My question is about the FOR loop below and it's something I see being used by prominent data scientists on Kaggle. However it doesn't seem to work for me.

Python 3.66. Pandas 0.23.4

setup

train = pd.DataFrame({'id': [2, 3, 1], 'time':['2017-04-17 22:23:22', '2018-05-22 14:20:00', '2017-01-09 08:02:14']})
test = pd.DataFrame({'id': [2, 3, 1], 'time':['2017-04-17 22:23:22', '2018-05-22 14:20:00', '2017-01-09 08:02:14']})
train

>>>         id  time  
>>>   0     2   2017-04-17 22:23:22
>>>   1     3   2018-05-22 14:20:00
>>>   2     1   2017-01-09 08:02:14

Sort it (this works)

train.sort_values('time', ascending=True)

>>>     id  time
>>> 2   1   2017-01-09 08:02:14
>>> 0   2   2017-04-17 22:23:22
>>> 1   3   2018-05-22 14:20:00

Sort it in a FOR loop - why does this not work?

for data in [train, test]:
    data = data.sort_values('time', ascending=True)
train

>>>     id  time
>>> 0   2   2017-04-17 22:23:22
>>> 1   3   2018-05-22 14:20:00
>>> 2   1   2017-01-09 08:02:14

jpp · Accepted Answer

Sort it in a FOR loop - why does this not work?

Because your for loop doesn't bind your newly defined variable data to the objects within your [train, test]. You are redefining data within each loop without changing train or test.

Instead, you can use sequence unpacking:

train, test = (df.sort_values('time') for df in (train, test))

Or, use enumerate in a for loop:

data = [train, test]
for idx, df in enumerate(data):
    data[idx] = df.sort_values('time')

Then refer to your dataframes via index, i.e. data[0], data[1].

Or, use a dictionary and iterate items:

data = {'train': train, 'test': test}

for key, df in d.items():
    data[key] = df.sort_values('time')

Then refer to your dataframes via key, i.e. data['train'], data['test'].

Can't modify Pandas DataFrame while iterating

setup

Sort it (this works)

Sort it in a FOR loop - why does this not work?

Answers (2)

Related Questions

Can&#39;t modify Pandas DataFrame while iterating

setup

Sort it (this works)

Sort it in a FOR loop - why does this not work?

Answers (2)

Related Questions

Can't modify Pandas DataFrame while iterating