Reputation: 59
I am trying to carrying out calculations on individual values that are stored in a nested list stored in a pandas DataFrame. My issue is on how to access these individual values.
I am working from a data set available here: https://datadryad.org/stash/dataset/doi:10.5061/dryad.h505v
I have imported the .json file in a pandas DataFrame and the elastic constants are stored in the column 'elastic_tensor'.
import pandas as pd
df = pd.read_json(workdir+"ec.json")
df['elastic_tensor'].head()
Out:
0 [[311.33514638650246, 144.45092552856926, 126....
1 [[306.93357350984974, 88.02634955100905, 105.6...
2 [[569.5291276937579, 157.8517489654999, 157.85...
3 [[69.28798774976904, 34.7875015216915, 37.3877...
4 [[349.3767766177825, 186.67131003104407, 176.4...
Name: elastic_tensor, dtype: object
In order to access the individual values, what I have done is expand the nested lists once (as I could not find a way to use .extend()
to flatten the nested list):
df1 = pd.DataFrame(df["elastic_tensor"].to_list() , columns=['c'+str(j) for j in range(1,7)])
Note: I have named the columns c1..c6 as the elastic constants in the end shall be called cij with i and j from 1 to 6.
Then I have expanded each of these columns in turns (as I could not find the way to do a loop):
dfc1 = pd.DataFrame(df1["c1"].to_list() , columns=['c1'+str(j) for j in range(1,7)])
dfc2 = pd.DataFrame(df1["c2"].to_list() , columns=['c2'+str(j) for j in range(1,7)])
dfc3 = pd.DataFrame(df1["c3"].to_list() , columns=['c3'+str(j) for j in range(1,7)])
dfc4 = pd.DataFrame(df1["c4"].to_list() , columns=['c4'+str(j) for j in range(1,7)])
dfc5 = pd.DataFrame(df1["c5"].to_list() , columns=['c5'+str(j) for j in range(1,7)])
dfc6 = pd.DataFrame(df1["c6"].to_list() , columns=['c6'+str(j) for j in range(1,7)])
before merging them
data_frames = [dfc1, dfc2, dfc3, dfc4, dfc5, dfc6]
df_merged = pd.DataFrame().join(data_frames, how="outer")
which gives me a DataFrame with columns containing the individual cij values: https://i.sstatic.net/odraQ.png
I can now carry out arithmetic operations on these individual values and add a column in the initial "df" dataframe with the results, but there must be a better way of doing it (especially if the matrices are large). Any idea?
Upvotes: 0
Views: 793
Reputation: 31156
apply(pd.Series)
to expand a list into columnsstack()
and unstack()
generate multi-index columns that are zero-indexed values into 2D listimport json
from pathlib import Path
# file downloaded from https://datadryad.org/stash/dataset/doi:10.5061/dryad.h505v
with open(Path.cwd().joinpath("ec.json")) as f: js = json.load(f)
df = pd.json_normalize(js)
# expand first dimension, put it into row index, expand second dimension, make multi-index columns
dfet = df["elastic_tensor"].apply(pd.Series).stack().apply(pd.Series).unstack()
# flatten multi-index columns, index from 1, instead of standard 0
dfet.columns = [f"c{i+1}{j+1}" for i,j in dfet.columns.to_flat_index()]
c11 | c12 | c13 | c14 | c15 | c16 | c21 | c22 | c23 | c24 | c25 | c26 | c31 | c32 | c33 | c34 | c35 | c36 | c41 | c42 | c43 | c44 | c45 | c46 | c51 | c52 | c53 | c54 | c55 | c56 | c61 | c62 | c63 | c64 | c65 | c66 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 311.335 | 144.451 | 126.176 | 0 | -0.110347 | 0 | 144.451 | 311.32 | 126.169 | 0 | -0.112161 | 0 | 126.176 | 126.169 | 332.185 | 0 | -0.107541 | 0 | 0 | 0 | 0 | 98.9182 | 0 | 0 | -0.110347 | -0.112161 | -0.107541 | 0 | 98.921 | 0 | 0 | 0 | 0 | 0 | 0 | 103.339 |
1 | 306.934 | 88.0263 | 105.696 | 2.53622 | -0.568262 | -0.188934 | 88.0263 | 298.869 | 101.79 | -1.43474 | -0.608261 | -0.226253 | 105.696 | 101.79 | 398.441 | 0.350166 | -0.577829 | -0.232358 | 2.53622 | -1.43474 | 0.350166 | 75.3104 | 0 | 0 | -0.568262 | -0.608261 | -0.577829 | 0 | 75.5826 | 1.92806 | -0.188934 | -0.226253 | -0.232358 | 0 | 1.92806 | 105.685 |
2 | 569.529 | 157.852 | 157.851 | 0 | 0 | 0 | 157.852 | 569.53 | 157.852 | 0 | 0 | 0 | 157.851 | 157.852 | 569.53 | 0 | 0 | 0 | 0 | 0 | 0 | 94.8801 | 0 | 0 | 0 | 0 | 0 | 0 | 94.88 | 0 | 0 | 0 | 0 | 0 | 0 | 94.8801 |
3 | 69.288 | 34.7875 | 37.3877 | 0 | 0 | 0 | 34.7875 | 78.1379 | 40.6047 | 0 | 0 | 0 | 37.3877 | 40.6047 | 70.1326 | 0 | 0 | 0 | 0 | 0 | 0 | 19.8954 | 0 | 0 | 0 | 0 | 0 | 0 | 4.75803 | 0 | 0 | 0 | 0 | 0 | 0 | 30.4095 |
4 | 349.377 | 186.671 | 176.476 | 0 | 0 | 0 | 186.671 | 415.51 | 213.834 | 0 | 0 | 0 | 176.476 | 213.834 | 407.479 | 0 | 0 | 0 | 0 | 0 | 0 | 120.112 | 0 | 0 | 0 | 0 | 0 | 0 | 125.443 | 0 | 0 | 0 | 0 | 0 | 0 | 74.9078 |
a = np.dstack(df["elastic_tensor"])
pd.DataFrame(a.reshape((a.shape[0]*a.shape[1], a.shape[2])).T,
columns=[f"c{i+1}{j+1}" for i in range(a.shape[0]) for j in range(a.shape[1])])
Upvotes: 1