Charles Wagner
Charles Wagner

Reputation: 123

Pandas dataframe only keeps last value in a for loop

I have the code below, and then my code output. Does anybody know why a, b, c variables are not keeping their values??

import pandas as pd

df = pd.DataFrame(columns=['A', 'B', 'C'])

for i in range(3):
    df.loc[0] = [i, i, i]
    if i == 0:
        a = df
        print "Printing a inside of the loop:"
        print a
    elif i == 1:
        b = df
        print "Printing b inside of the loop:"
        print b
    elif i == 2:
        c = df
        print "Printing c inside of the loop:"
        print c

print "Printing a outside of the loop:"
print a
print "Printing b outside of the loop:"
print b
print "Printing c outside of the loop:"
print c

Code output: enter image description here

Upvotes: 0

Views: 1507

Answers (1)

n00dle
n00dle

Reputation: 6053

Your problem here is that a,b,c aren't actually separate variables.

The way Python works under-the-hood means that why you say a = df, Python makes a reference to df, so a actually points at the same underlying memory as df - it's basically just another name for the same variable.

That means what you're doing here is overwriting the numbers in df for each iteration of the loop, then when you read back the values in a, b and c, you're just reading the data that's sitting in df.

What you actually need is a true copy of the dataframe, using (e.g.) a = df.copy().

Upvotes: 2

Related Questions