Kristian Nielsen
Kristian Nielsen

Reputation: 149

Pandas concat doesn't work as expected when one DF is created using a for loop

I'm trying to concatenate two Pandas dataframes, where one of them is created using a for loop. For some resason, pd.concat won't concatenate as expected by rows.

The code below illustrates the problem:

datasort = [143.477514,112.951071,869.627662,193.471612,140.428981,301.053040,190.684404,180.142223,127.569191,404.871493]

sample_1 = pd.DataFrame(np.random.choice(datasort,(8,10)))
samples_2 = pd.DataFrame()

for t in np.arange(10):

    samples_2[str(t)] = np.random.choice(datasort,2)

samples_3=pd.concat([samples_2,sample_1],ignore_index=True)

The code produces a 10x20 matix, with a lot of NaNs, and not a 10x10 as I would expect.

Can someone please point out what I'm obviously is missing?

Upvotes: 1

Views: 487

Answers (1)

jezrael
jezrael

Reputation: 862731

Problem is you cast columns to strings, sa DataFrame cannot be aligned, because different columns names:

for t in np.arange(10):
    #casting to string 
    samples_2[str(t)] = np.random.choice(datasort,2)

print (sample_1.columns)
RangeIndex(start=0, stop=10, step=1)
print (samples_2.columns)
Index(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], dtype='object')

Solution:

for t in np.arange(10):
    samples_2[t] = np.random.choice(datasort,2)

Upvotes: 3

Related Questions