Reputation: 495
I have three dataframes consisting of more 556, 555, and ~ 1600 columns each. I want to horizontally stack them, while merging the like columns. How would I do this with so many columns? I tried re-indexing so indices went from 0-252 with the first df, 232-2518 on the second df and 2519 to ~4000 on the final but I'm still getting the following error:
InvalidIndexError: Reindexing only valid with uniquely valued Index objects
Is it better to use merge or join over concat in this case?
The data can be found here: https://github.com/eoefelein/sample_data
Thank you so much!
Upvotes: 1
Views: 90
Reputation: 23227
Use pd.concat()
with axis=1
and ignore_index=True
:
Assuming you have already read the CSV files into dataframes df1
, df2
, df3
:
df_out = pd.concat([df1, df2, df3], axis=1, ignore_index=True)
I might have overlooked that you want to horizontally stack them. In that case, just use the default axis=0
:
df_out = pd.concat([df1, df2, df3], ignore_index=True)
Keep the ignore_index=True
in order to re-serialize the row index.
Upvotes: 1
Reputation: 333
Do you have a unique identifier across each dataframe to join them on?
If not I think you just want a plain pd.concat
which will union your dataframes and the total number of columns will be the distinct count of columns across all 3 dataframes
import pandas as pd
df1 = pd.read_csv('sample_data/final_pre_rfe_fiverr.csv')
df2 = pd.read_csv('sample_data/final_pre_rfe_freelancer.csv')
df3 = pd.read_csv('sample_data/final_pre_rfe_pph.csv')
pd.concat((df1,df2,df3))
Notice in the output below that new columns are horizontally stacked while old ones are merged.
Output:
Unnamed: 0 title .net 360 photography 2d animation \
0 253 mobile 0.0 0.0 0
1 254 quality assurance 0.0 0.0 0
2 255 data scientist 0.0 0.0 0
3 256 data scientist 0.0 0.0 0
4 257 quality assurance 0.0 0.0 0
.. ... ... ... ... ...
248 248 data scientist NaN NaN 0
249 249 fullstack NaN NaN 0
250 250 fullstack NaN NaN 0
251 251 fullstack NaN NaN 0
252 252 fullstack NaN NaN 0
3d modelling 3d rendering 3d texturing 3ddesign 3dmodeling ... \
0 0.0 0 0.0 0.0 0.0 ...
1 0.0 0 0.0 0.0 0.0 ...
2 0.0 0 0.0 0.0 0.0 ...
3 0.0 0 0.0 0.0 0.0 ...
4 0.0 0 0.0 0.0 0.0 ...
.. ... ... ... ... ... ...
248 NaN 0 NaN NaN NaN ...
249 NaN 0 NaN NaN NaN ...
250 NaN 0 NaN NaN NaN ...
251 NaN 0 NaN NaN NaN ...
252 NaN 0 NaN NaN NaN ...
webui studio 2013 for asp.net windows administration \
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
.. ... ...
248 0.0 0.0
249 0.0 0.0
250 0.0 0.0
251 0.0 0.0
252 0.0 0.0
windows powershell programming language.1 wordpress e-commerce \
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
.. ... ...
248 0.0 0.0
249 0.0 0.0
250 0.0 0.0
251 0.0 0.0
252 0.0 0.0
wordpress plugin.1 wordpress template worpress migration zapier \
0 NaN NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
.. ... ... ... ...
248 0.0 0.0 0.0 0.0
249 0.0 0.0 0.0 0.0
250 0.0 0.0 0.0 0.0
251 0.0 0.0 0.0 0.0
252 0.0 0.0 0.0 0.0
zend framework zimbra
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 NaN NaN
4 NaN NaN
.. ... ...
248 0.0 0.0
249 0.0 0.0
250 0.0 0.0
251 0.0 0.0
252 0.0 0.0
[4194 rows x 2194 columns]
Hope it helps! If not, would you mind clarifying a little more what you're looking for?
Upvotes: 1