Reputation: 163
I am using json_normalize to "expand-out" data from a single row, into multiple rows - populating columns using the data from the originating row and the expanded-out data.
I have a single situation where I have the same column name in the originating dataframe, and the data that is being exploded-out and joined back.
Understandably, the initial error I received was:
ValueError: columns overlap but no suffix specified: Index(['details'], dtype='object')
However, specifying a suffix ("_right") yields an output similar to:
status details details_right
0 A 752016134
1 A 752016134
2 A 835847477
3 A 405296228
4 A 193753632
5 B 667711915
6 A 305304644
I am looking for:
status details
0 A 752016134
1 A 752016134
2 A 835847477
3 A 405296228
4 A 193753632
5 B 667711915
6 A 305304644
The code I am using:
newdataframe = (pd.concat({i: json_normalize(block) for i, block in dataframe1.pop('data').items()})
.reset_index(level=1, drop=True)
.join(dataframe1, how='right', rsuffix='_right')
.reset_index(drop=True))
--
Checking the pandas API reference, it seems the only way to do this, is to rename/concatenate the columns after the resulting dataframe is created.
However, I would like to do this during the new dataframe creation step so that if there was another duplicate column, I would not need to "hardcode" any column/fix details.
Please let me know if there is a better way of doing what I am looking for.
Thank you! Please do let me know if I can clarify anything!
Upvotes: 0
Views: 142
Reputation: 3929
combine_first should work:
newdataframe = (pd.concat({i: json_normalize(block) for i, block in dataframe1.pop('data').items()})
.reset_index(level=1, drop=True)
.combine_first(dataframe1)
.reset_index(drop=True))
But maybe there is an error in my logic and you need combine
instead.
Upvotes: 2