Jep
Jep

Reputation: 59

Nested lists in DataFrame column: how to carry out calculations on individual values?

I am trying to carrying out calculations on individual values that are stored in a nested list stored in a pandas DataFrame. My issue is on how to access these individual values.

I am working from a data set available here: https://datadryad.org/stash/dataset/doi:10.5061/dryad.h505v

I have imported the .json file in a pandas DataFrame and the elastic constants are stored in the column 'elastic_tensor'.

import pandas as pd
df = pd.read_json(workdir+"ec.json")
df['elastic_tensor'].head()

Out:

0    [[311.33514638650246, 144.45092552856926, 126....
1    [[306.93357350984974, 88.02634955100905, 105.6...
2    [[569.5291276937579, 157.8517489654999, 157.85...
3    [[69.28798774976904, 34.7875015216915, 37.3877...
4    [[349.3767766177825, 186.67131003104407, 176.4...
Name: elastic_tensor, dtype: object

In order to access the individual values, what I have done is expand the nested lists once (as I could not find a way to use .extend() to flatten the nested list):

df1 = pd.DataFrame(df["elastic_tensor"].to_list() , columns=['c'+str(j) for j in range(1,7)])

Note: I have named the columns c1..c6 as the elastic constants in the end shall be called cij with i and j from 1 to 6.

Then I have expanded each of these columns in turns (as I could not find the way to do a loop):

dfc1 = pd.DataFrame(df1["c1"].to_list() , columns=['c1'+str(j) for j in range(1,7)])
dfc2 = pd.DataFrame(df1["c2"].to_list() , columns=['c2'+str(j) for j in range(1,7)])
dfc3 = pd.DataFrame(df1["c3"].to_list() , columns=['c3'+str(j) for j in range(1,7)])
dfc4 = pd.DataFrame(df1["c4"].to_list() , columns=['c4'+str(j) for j in range(1,7)])
dfc5 = pd.DataFrame(df1["c5"].to_list() , columns=['c5'+str(j) for j in range(1,7)])
dfc6 = pd.DataFrame(df1["c6"].to_list() , columns=['c6'+str(j) for j in range(1,7)])

before merging them

data_frames = [dfc1, dfc2, dfc3, dfc4, dfc5, dfc6]
df_merged = pd.DataFrame().join(data_frames, how="outer")

which gives me a DataFrame with columns containing the individual cij values: https://i.sstatic.net/odraQ.png

I can now carry out arithmetic operations on these individual values and add a column in the initial "df" dataframe with the results, but there must be a better way of doing it (especially if the matrices are large). Any idea?

Upvotes: 0

Views: 793

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31156

  • approach using apply(pd.Series) to expand a list into columns
  • using stack() and unstack() generate multi-index columns that are zero-indexed values into 2D list
  • flatten multi-index to match your stated requirement (one-indexed instead of zero indexed)
import json
from pathlib import Path
# file downloaded from https://datadryad.org/stash/dataset/doi:10.5061/dryad.h505v
with open(Path.cwd().joinpath("ec.json")) as f: js = json.load(f)
df = pd.json_normalize(js)
# expand first dimension, put it into row index, expand second dimension, make multi-index columns
dfet = df["elastic_tensor"].apply(pd.Series).stack().apply(pd.Series).unstack()
# flatten multi-index columns, index from 1, instead of standard 0
dfet.columns = [f"c{i+1}{j+1}" for i,j in dfet.columns.to_flat_index()]

head(5)

c11 c12 c13 c14 c15 c16 c21 c22 c23 c24 c25 c26 c31 c32 c33 c34 c35 c36 c41 c42 c43 c44 c45 c46 c51 c52 c53 c54 c55 c56 c61 c62 c63 c64 c65 c66
0 311.335 144.451 126.176 0 -0.110347 0 144.451 311.32 126.169 0 -0.112161 0 126.176 126.169 332.185 0 -0.107541 0 0 0 0 98.9182 0 0 -0.110347 -0.112161 -0.107541 0 98.921 0 0 0 0 0 0 103.339
1 306.934 88.0263 105.696 2.53622 -0.568262 -0.188934 88.0263 298.869 101.79 -1.43474 -0.608261 -0.226253 105.696 101.79 398.441 0.350166 -0.577829 -0.232358 2.53622 -1.43474 0.350166 75.3104 0 0 -0.568262 -0.608261 -0.577829 0 75.5826 1.92806 -0.188934 -0.226253 -0.232358 0 1.92806 105.685
2 569.529 157.852 157.851 0 0 0 157.852 569.53 157.852 0 0 0 157.851 157.852 569.53 0 0 0 0 0 0 94.8801 0 0 0 0 0 0 94.88 0 0 0 0 0 0 94.8801
3 69.288 34.7875 37.3877 0 0 0 34.7875 78.1379 40.6047 0 0 0 37.3877 40.6047 70.1326 0 0 0 0 0 0 19.8954 0 0 0 0 0 0 4.75803 0 0 0 0 0 0 30.4095
4 349.377 186.671 176.476 0 0 0 186.671 415.51 213.834 0 0 0 176.476 213.834 407.479 0 0 0 0 0 0 120.112 0 0 0 0 0 0 125.443 0 0 0 0 0 0 74.9078

numpy approach

a = np.dstack(df["elastic_tensor"])
pd.DataFrame(a.reshape((a.shape[0]*a.shape[1], a.shape[2])).T, 
             columns=[f"c{i+1}{j+1}" for i in range(a.shape[0]) for j in range(a.shape[1])])

Upvotes: 1

Related Questions