mathee
mathee

Reputation: 163

Join Two Pandas Dataframes, (Merging) Values from Identical Column Names

I am using json_normalize to "expand-out" data from a single row, into multiple rows - populating columns using the data from the originating row and the expanded-out data.

I have a single situation where I have the same column name in the originating dataframe, and the data that is being exploded-out and joined back.

Understandably, the initial error I received was:

ValueError: columns overlap but no suffix specified: Index(['details'], dtype='object')

However, specifying a suffix ("_right") yields an output similar to:

      status        details         details_right
0       A           752016134   
1       A           752016134   
2       A           835847477   
3       A           405296228   
4       A           193753632   
5       B           667711915   
6       A                           305304644

I am looking for:

        status      details         
0       A           752016134   
1       A           752016134   
2       A           835847477   
3       A           405296228   
4       A           193753632   
5       B           667711915   
6       A           305304644

The code I am using:

newdataframe = (pd.concat({i: json_normalize(block) for i, block in dataframe1.pop('data').items()})
           .reset_index(level=1, drop=True)
           .join(dataframe1, how='right', rsuffix='_right')
           .reset_index(drop=True))

--

Checking the pandas API reference, it seems the only way to do this, is to rename/concatenate the columns after the resulting dataframe is created.

However, I would like to do this during the new dataframe creation step so that if there was another duplicate column, I would not need to "hardcode" any column/fix details.

Please let me know if there is a better way of doing what I am looking for.

Thank you! Please do let me know if I can clarify anything!

Upvotes: 0

Views: 142

Answers (1)

above_c_level
above_c_level

Reputation: 3929

combine_first should work:

newdataframe = (pd.concat({i: json_normalize(block) for i, block in dataframe1.pop('data').items()})
           .reset_index(level=1, drop=True)
           .combine_first(dataframe1)
           .reset_index(drop=True))

But maybe there is an error in my logic and you need combine instead.

Upvotes: 2

Related Questions