Reputation: 15
I have a dataframe that looks something like the following. Each of the columns el0 to el3 holds the name of an element, and each of the columns x0 to x3 holds the amount of the corresponding element (el0 and x0, el1 and x1, etc.) that is present in an alloy.
el0 | el1 | el2 | el3 | x0 | x1 | x2 | x3 |
---|---|---|---|---|---|---|---|
AL | MO | CR | TI | 0.9 | 0.0 | 0.0 | 0.1 |
AL | MO | CR | TI | 0.8 | 0.0 | 0.1 | 0.1 |
I want to turn the dataset into something that looks like the below, where each of the columns is an element, and the rows tell me how much of that element is present. In essence, taking the unique values from el0 to el3 in the original dataframe above, making them the column headers of the new dataframe, and plugging in x0 through x3 from the original dataframe into the correct slot.
AL | MO | CR | TI |
---|---|---|---|
0.9 | 0.0 | 0.0 | 0.1 |
0.8 | 0.0 | 0.1 | 0.1 |
Right now, I just loop through each of the rows, access the element held in each of el0 through el3, use that element as a key in a dictionary, and then go pull the corresponding amount from x0, x1, x2, or x3 for the value, and plug that dictionary into the end of the new dataframe (code below). My dataset isn't particularly large, so I'm perfectly content to just let my computer sit for the minute it takes to do this. However, I'm curious if there's a more elegant, Panda-esque solution that isn't as gnarly as the one I'm using.
Thanks in advance!
for index, row in original_df.iterrows():
d = {}
for i in unique_features_list:
d[i] = 0 #initializing dict value to 0
for col in original_df.columns:
if col == 'el0':
element = row['el0']
d[element] = row['x0']
if col == 'el1':
element = row['el1']
d[element] = row['x1']
if col == 'el2':
element = row['el2']
d[element] = row['x2']
if col == 'el3':
element = row['el3']
d[element] = row['x3']
features_df.loc[len(features_df)] = [*d.values()] #append dict to end of new dataframe
Upvotes: 0
Views: 32
Reputation: 153460
Try using pd.wide_to_long
and reshape the dataframe using set_index
and unstack
:
pd.wide_to_long(df.reset_index(), ['el','x'], 'index', 'No')\
.reset_index(level=1, drop=True)\
.set_index('el', append=True)['x']\
.unstack()
Output (Note: typo in data above):
el AL Al CR MO TI
index
0 NaN 0.9 0.0 0.0 0.1
1 0.8 NaN 0.1 0.0 0.1
Output (with corrected typo):
el AL CR MO TI
index
0 0.9 0.0 0.0 0.1
1 0.8 0.1 0.0 0.1
Upvotes: 2