Reputation: 285
I would like to expand the nested lists to multiple rows and columns. At the same time, map back the results to the corresponding column values.
The dataframe is like the following.
df=pd.DataFrame({
'column_name':['income_level', 'geo_level'],
'results':[[[0, 12, 13], [0, 98, 43], [1, 29, 73], [2, 12, 34]], [[0, 78, 23], [1, 56, 67], [2, 67, 34]]]})
column_name | results
----------------------
income_level | [[0, 12, 13], [0, 98, 43], [1, 29, 73], [2, 12, 34]]
geo_level | [[0, 78, 23], [1, 56, 67], [2, 67, 34]]
The final results I'm looking for are like this. (expanding the nested list to rows and columns and matching the corresponding column values)
column_name | num |pct | index
income_level | 0 | 12 | 13
income_level | 0 | 98 | 43
income_level | 1 | 29 | 73
income_level | 2 | 12 | 34
geo_level | 0 | 78 | 23
geo_level | 1 | 56 | 67
geo_level | 2 | 67 | 34
My current code:
pd.DataFrame(list(itertools.chain(*df['results'].values.tolist())), columns=['num', 'pct', 'index'])
I'm able to expand and create header but I cannot match back to corresponding column values (i.e. the column_name)
Upvotes: 1
Views: 884
Reputation: 31011
Try the following solution:
Define a "reformatting" function:
def reform(row):
res = pd.DataFrame(row.results, columns=['num', 'pct', 'index'])
res.insert(0, 'column_name', row.column_name)
return res
Then apply it and concatenate results:
lst = df.apply(reform, axis=1).tolist()
pd.concat(lst).reset_index(drop=True)
Upvotes: 0
Reputation: 25269
Explode
column results
and assign to df1
. Create the new dataframe from list of sublist of df1.results
and reset_index
df1 = df.explode('results')
pd.DataFrame(df1.results.tolist(),
index=df1.column_name,
columns=['num', 'pct', 'index']).reset_index()
Out[562]:
column_name num pct index
0 income_level 0 12 13
1 income_level 0 98 43
2 income_level 1 29 73
3 income_level 2 12 34
4 geo_level 0 78 23
5 geo_level 1 56 67
6 geo_level 2 67 34
On pandas < 0.25, use sum
, np.repeat
, and reset_index
to achieve the same thing
pd.DataFrame(df.results.sum(),
index=np.repeat(df.column_name, df.results.str.len()),
columns=['num', 'pct', 'index']).reset_index()
Out[572]:
column_name num pct index
0 income_level 0 12 13
1 income_level 0 98 43
2 income_level 1 29 73
3 income_level 2 12 34
4 geo_level 0 78 23
5 geo_level 1 56 67
6 geo_level 2 67 34
Upvotes: 1