Reputation: 5789

Efficiently Concatenate Pandas DataFrames in series

I have 10 DataFrames with equal number of rows and each having their own set of unique columns (not shared between any dataframes). I want to simply add the dataframes together in series, such that the final dataframe contains all the columns contained in all the dataframes. The first row of the final dataframe would contain the first row of the first, followed by the first row of the second, and so on til the tenth dataframe. I have tried pandas.concat(dataframes, axis=1), but it ended up creating NaN values in my numerical data somehow. I worked around it by writing an extremely slow and ugly method that increments through the rows by index and creating row by row the final data frame. What is the correct pandas way to do this?

Upvotes: 1

Answers (3)

cs95

Reputation: 402922

Assuming all your dataframes are in a list df_list:

df0_index = df_list[0].index # get the first data frame's index

for i in range(1, len(df_list)):
    df_list[i] = df_list[i].set_index(df0_index) # reindex all the other dataframes

df_out = pd.concat(df_list, axis=1) # concatenate

Upvotes: 1

ICW

Reputation: 5789

Got it working. Simply had to set "ignore_index" to true when calling pandas.concat().

pd.concat(df_list, axis=1, ignore_index=True) # returns dataframes correctly.

Note that reindexing wouldn't work for some reason.

Upvotes: 1

Scott Boston

Reputation: 153510

You could do this with list comprehension:

pd.concat([df.reset_index(drop=True) for df in df_list], axis = 1)

Upvotes: 2

Efficiently Concatenate Pandas DataFrames in series

Answers (3)

Related Questions