Erin
Erin

Reputation: 495

joining three dataframes horizontally and merging like columns

I have three dataframes consisting of more 556, 555, and ~ 1600 columns each. I want to horizontally stack them, while merging the like columns. How would I do this with so many columns? I tried re-indexing so indices went from 0-252 with the first df, 232-2518 on the second df and 2519 to ~4000 on the final but I'm still getting the following error:

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

Is it better to use merge or join over concat in this case?

The data can be found here: https://github.com/eoefelein/sample_data

Thank you so much!

Upvotes: 1

Views: 90

Answers (2)

SeaBean
SeaBean

Reputation: 23227

Use pd.concat() with axis=1 and ignore_index=True:

Assuming you have already read the CSV files into dataframes df1, df2, df3:

df_out = pd.concat([df1, df2, df3], axis=1, ignore_index=True)

Edit

I might have overlooked that you want to horizontally stack them. In that case, just use the default axis=0:

df_out = pd.concat([df1, df2, df3], ignore_index=True)

Keep the ignore_index=True in order to re-serialize the row index.

Upvotes: 1

Marshall K
Marshall K

Reputation: 333

Do you have a unique identifier across each dataframe to join them on?

If not I think you just want a plain pd.concat which will union your dataframes and the total number of columns will be the distinct count of columns across all 3 dataframes

import pandas as pd

df1 = pd.read_csv('sample_data/final_pre_rfe_fiverr.csv')
df2 = pd.read_csv('sample_data/final_pre_rfe_freelancer.csv')
df3 = pd.read_csv('sample_data/final_pre_rfe_pph.csv')
pd.concat((df1,df2,df3))

Notice in the output below that new columns are horizontally stacked while old ones are merged.

Output:

     Unnamed: 0              title  .net  360 photography  2d animation  \
0           253             mobile   0.0              0.0             0   
1           254  quality assurance   0.0              0.0             0   
2           255     data scientist   0.0              0.0             0   
3           256     data scientist   0.0              0.0             0   
4           257  quality assurance   0.0              0.0             0   
..          ...                ...   ...              ...           ...   
248         248     data scientist   NaN              NaN             0   
249         249          fullstack   NaN              NaN             0   
250         250          fullstack   NaN              NaN             0   
251         251          fullstack   NaN              NaN             0   
252         252          fullstack   NaN              NaN             0   

     3d modelling  3d rendering  3d texturing  3ddesign  3dmodeling  ...  \
0             0.0             0           0.0       0.0         0.0  ...   
1             0.0             0           0.0       0.0         0.0  ...   
2             0.0             0           0.0       0.0         0.0  ...   
3             0.0             0           0.0       0.0         0.0  ...   
4             0.0             0           0.0       0.0         0.0  ...   
..            ...           ...           ...       ...         ...  ...   
248           NaN             0           NaN       NaN         NaN  ...   
249           NaN             0           NaN       NaN         NaN  ...   
250           NaN             0           NaN       NaN         NaN  ...   
251           NaN             0           NaN       NaN         NaN  ...   
252           NaN             0           NaN       NaN         NaN  ...   

     webui studio 2013 for asp.net  windows administration  \
0                              NaN                     NaN   
1                              NaN                     NaN   
2                              NaN                     NaN   
3                              NaN                     NaN   
4                              NaN                     NaN   
..                             ...                     ...   
248                            0.0                     0.0   
249                            0.0                     0.0   
250                            0.0                     0.0   
251                            0.0                     0.0   
252                            0.0                     0.0   

     windows powershell programming language.1  wordpress e-commerce  \
0                                          NaN                   NaN   
1                                          NaN                   NaN   
2                                          NaN                   NaN   
3                                          NaN                   NaN   
4                                          NaN                   NaN   
..                                         ...                   ...   
248                                        0.0                   0.0   
249                                        0.0                   0.0   
250                                        0.0                   0.0   
251                                        0.0                   0.0   
252                                        0.0                   0.0   

     wordpress plugin.1  wordpress template  worpress migration  zapier  \
0                   NaN                 NaN                 NaN     NaN   
1                   NaN                 NaN                 NaN     NaN   
2                   NaN                 NaN                 NaN     NaN   
3                   NaN                 NaN                 NaN     NaN   
4                   NaN                 NaN                 NaN     NaN   
..                  ...                 ...                 ...     ...   
248                 0.0                 0.0                 0.0     0.0   
249                 0.0                 0.0                 0.0     0.0   
250                 0.0                 0.0                 0.0     0.0   
251                 0.0                 0.0                 0.0     0.0   
252                 0.0                 0.0                 0.0     0.0   

     zend framework  zimbra  
0               NaN     NaN  
1               NaN     NaN  
2               NaN     NaN  
3               NaN     NaN  
4               NaN     NaN  
..              ...     ...  
248             0.0     0.0  
249             0.0     0.0  
250             0.0     0.0  
251             0.0     0.0  
252             0.0     0.0  

[4194 rows x 2194 columns]

Hope it helps! If not, would you mind clarifying a little more what you're looking for?

Upvotes: 1

Related Questions