Reputation: 99
I need to convert categorical rows to separate columns while keeping the primary keys in the data.
In the data, all the useful attributes live in 2 columns (attribute_name and attribute_value). I want to convert the rows in attribute_name to separate columns and populate them with the corresponding data from attribute_value column (like shown in the image below).
Note: Not all the part_ids have the same attribute names or have them populated. After the transformation, some part_ids will have missing values in the new columns.
I tried the pandas unstack() and pivot() functions but they also convert platform_id and part_id values to columns.
The below code came closest to my requirement but it created duplicated columns for each part_id and I couldn't do this transformation while keeping my primary keys like platform_id and part_id:
df[['attribute_name', attribute_value']].set_index('attribute_name').T.rename_axis(None axis=1).reset_index(drop=True)
Adding the code to re-create the dataframe:
data = {'Platform_id':[4356, 4356, 4356, 23675, 23675, 23675, 772, 772],\
'part_id':['XCVT43', 'XCVT43', 'XCVT43', 'TT3344', 'TT3344', 'TT3344', 'GHTYW2', 'GHTYW2'], \
'class_id':['PROCESSOR', 'PROCESSOR','PROCESSOR','PROCESSOR','PROCESSOR','PROCESSOR','PROCESSOR','PROCESSOR',], \
'attribute_name': ['Vendor_name', 'Cache', 'Clock-speed', 'Vendor_name', 'Model_name', 'Clock-speed', 'Vendor_name', 'Cache'], \
'attribute_value': ['Intel', '4', '3.1', 'Intel', '4500U', '2.3', None, '4']}
df = pd.DataFrame(data)
Upvotes: 2
Views: 2082
Reputation: 30940
You can use:
vendors=df['attribute_name'].unique()
df2=pd.concat([df.set_index(['Platform_id','part_id','class_id']).groupby('attribute_name')['attribute_value'].get_group(key) for key in vendors],axis=1)
df2.columns=vendors
df2.reset_index(inplace=True)
print(df2)
Platform_id part_id class_id Vendor_name Cache Clock-speed Model_name
0 772 GHTYW2 PROCESSOR None 4 NaN NaN
1 4356 XCVT43 PROCESSOR Intel 4 3.1 NaN
2 23675 TT3344 PROCESSOR Intel NaN 2.3 4500U
Upvotes: 2