ICW
ICW

Reputation: 5789

Efficiently Concatenate Pandas DataFrames in series

I have 10 DataFrames with equal number of rows and each having their own set of unique columns (not shared between any dataframes). I want to simply add the dataframes together in series, such that the final dataframe contains all the columns contained in all the dataframes. The first row of the final dataframe would contain the first row of the first, followed by the first row of the second, and so on til the tenth dataframe. I have tried pandas.concat(dataframes, axis=1), but it ended up creating NaN values in my numerical data somehow. I worked around it by writing an extremely slow and ugly method that increments through the rows by index and creating row by row the final data frame. What is the correct pandas way to do this?

Upvotes: 1

Views: 3742

Answers (3)

cs95
cs95

Reputation: 402922

Assuming all your dataframes are in a list df_list:

df0_index = df_list[0].index # get the first data frame's index

for i in range(1, len(df_list)):
    df_list[i] = df_list[i].set_index(df0_index) # reindex all the other dataframes

df_out = pd.concat(df_list, axis=1) # concatenate 

Upvotes: 1

ICW
ICW

Reputation: 5789

Got it working. Simply had to set "ignore_index" to true when calling pandas.concat().

pd.concat(df_list, axis=1, ignore_index=True) # returns dataframes correctly.

Note that reindexing wouldn't work for some reason.

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153510

You could do this with list comprehension:

pd.concat([df.reset_index(drop=True) for df in df_list], axis = 1)

Upvotes: 2

Related Questions