neutrino
neutrino

Reputation: 924

Pandas - take multiple columns and transform them into a single column of dictionary objects?

I am trying to transform a DataFrame by combining extra columns into a dictionary.

my DataFrame will always have four columns, at least: record, yhat, residual, and hat, with additional columns in different cases. My current df head looks like this:

   record    yhat  residual      hat   RinvRes  AOMstat
0       1  6.7272  -0.57130  0.04985  0.009825  0.02041                
1       2  6.5568   0.19460  0.09771 -0.014930 -0.03078                
2       3  6.5457   0.16190  0.09765  0.272800  0.56260

If we look at the top column, we see that there are 2 additional columns, RinvRes and AOMstat

   record    yhat  residual      hat   RinvRes  AOMstat
0       1  6.7272  -0.57130  0.04985  0.009825  0.02041  

I would like to combine those columns into a dictionary, where the column name is a key in a dictionary, eg :

   record    yhat  residual      hat   additional
0       1  6.7272  -0.57130  0.04985   {“RinvRes“: “0.2291E-01“, “AOMstat“ : “0.3224E-01“}

Upvotes: 1

Views: 115

Answers (3)

PreciXon
PreciXon

Reputation: 453

Although the above answers are more elegant and efficient, here's a more simplistic version:

rinvres = df['RinvRes'].values.tolist()
aomstat = df['AOMstat'].values.tolist()
df.drop(['RinvRes', 'AOMstat'], axis=1)

additional = []

for i in range(len(rinvres)):
    add = {
        'RinvRes': rinvres[i],
        'AOMstat': aomstat[i]
    }

    additional.append(add)

df['additional'] = additional

Upvotes: 0

Ben.T
Ben.T

Reputation: 29635

IIUC, starting from the list of the 4 columns, you can get extra columns names wtih difference and use to_dict to aggregate them

# columns you have in common
keep_cols = ['record', 'yhat', 'residual', 'hat']

# get columns to agg into dict
extra_cols = df.columns.difference(keep_cols)
# create the result
new_df = (
    df[keep_cols]
      .assign(additional = df[extra_cols].agg(lambda x: x.to_dict(), axis=1))
)

print(new_df)
   record    yhat  residual      hat  \
0       1  6.7272   -0.5713  0.04985   
1       2  6.5568    0.1946  0.09771   
2       3  6.5457    0.1619  0.09765   

                                   additional  
0   {'AOMstat': 0.02041, 'RinvRes': 0.009825}  
1  {'AOMstat': -0.03078, 'RinvRes': -0.01493}  
2      {'AOMstat': 0.5626, 'RinvRes': 0.2728}  

Upvotes: 2

Umar.H
Umar.H

Reputation: 23099

in one step with .join, .agg(dict) and .drop

first create your list of aggregate columns

agg_cols = ['RinvRes', 'AOMstat']

df1 = df.join(df[agg_cols].agg(dict,axis=1)\
                          .to_frame('additional')).drop(agg_cols,1)

print(df1)

   record    yhat  residual      hat                                  additional
0       1  6.7272   -0.5713  0.04985   {'RinvRes': 0.009825, 'AOMstat': 0.02041}
1       2  6.5568    0.1946  0.09771  {'RinvRes': -0.01493, 'AOMstat': -0.03078}
2       3  6.5457    0.1619  0.09765      {'RinvRes': 0.2728, 'AOMstat': 0.5626}

Upvotes: 2

Related Questions