Reputation: 924
I am trying to transform a DataFrame by combining extra columns into a dictionary.
my DataFrame will always have four columns, at least: record, yhat, residual, and hat, with additional columns in different cases. My current df head looks like this:
record yhat residual hat RinvRes AOMstat
0 1 6.7272 -0.57130 0.04985 0.009825 0.02041
1 2 6.5568 0.19460 0.09771 -0.014930 -0.03078
2 3 6.5457 0.16190 0.09765 0.272800 0.56260
If we look at the top column, we see that there are 2 additional columns, RinvRes and AOMstat
record yhat residual hat RinvRes AOMstat
0 1 6.7272 -0.57130 0.04985 0.009825 0.02041
I would like to combine those columns into a dictionary, where the column name is a key in a dictionary, eg :
record yhat residual hat additional
0 1 6.7272 -0.57130 0.04985 {“RinvRes“: “0.2291E-01“, “AOMstat“ : “0.3224E-01“}
Upvotes: 1
Views: 115
Reputation: 453
Although the above answers are more elegant and efficient, here's a more simplistic version:
rinvres = df['RinvRes'].values.tolist()
aomstat = df['AOMstat'].values.tolist()
df.drop(['RinvRes', 'AOMstat'], axis=1)
additional = []
for i in range(len(rinvres)):
add = {
'RinvRes': rinvres[i],
'AOMstat': aomstat[i]
}
additional.append(add)
df['additional'] = additional
Upvotes: 0
Reputation: 29635
IIUC, starting from the list of the 4 columns, you can get extra columns names wtih difference
and use to_dict
to aggregate them
# columns you have in common
keep_cols = ['record', 'yhat', 'residual', 'hat']
# get columns to agg into dict
extra_cols = df.columns.difference(keep_cols)
# create the result
new_df = (
df[keep_cols]
.assign(additional = df[extra_cols].agg(lambda x: x.to_dict(), axis=1))
)
print(new_df)
record yhat residual hat \
0 1 6.7272 -0.5713 0.04985
1 2 6.5568 0.1946 0.09771
2 3 6.5457 0.1619 0.09765
additional
0 {'AOMstat': 0.02041, 'RinvRes': 0.009825}
1 {'AOMstat': -0.03078, 'RinvRes': -0.01493}
2 {'AOMstat': 0.5626, 'RinvRes': 0.2728}
Upvotes: 2
Reputation: 23099
in one step with .join
, .agg(dict)
and .drop
first create your list of aggregate columns
agg_cols = ['RinvRes', 'AOMstat']
df1 = df.join(df[agg_cols].agg(dict,axis=1)\
.to_frame('additional')).drop(agg_cols,1)
print(df1)
record yhat residual hat additional
0 1 6.7272 -0.5713 0.04985 {'RinvRes': 0.009825, 'AOMstat': 0.02041}
1 2 6.5568 0.1946 0.09771 {'RinvRes': -0.01493, 'AOMstat': -0.03078}
2 3 6.5457 0.1619 0.09765 {'RinvRes': 0.2728, 'AOMstat': 0.5626}
Upvotes: 2