Expand nested lists to rows, create headers, and map back to original columns

Question

I would like to expand the nested lists to multiple rows and columns. At the same time, map back the results to the corresponding column values.

The dataframe is like the following.

df=pd.DataFrame({
'column_name':['income_level', 'geo_level'],
'results':[[[0, 12, 13], [0, 98, 43], [1, 29, 73], [2, 12, 34]], [[0, 78, 23], [1, 56, 67], [2, 67, 34]]]}) 

column_name  |  results
----------------------
income_level | [[0, 12, 13], [0, 98, 43], [1, 29, 73], [2, 12, 34]]
geo_level    | [[0, 78, 23], [1, 56, 67], [2, 67, 34]]

The final results I'm looking for are like this. (expanding the nested list to rows and columns and matching the corresponding column values)

column_name  | num |pct | index
income_level | 0   | 12 | 13 
income_level | 0   | 98 | 43
income_level | 1   | 29 | 73 
income_level | 2   | 12 | 34
geo_level    | 0   | 78 | 23
geo_level    | 1   | 56 | 67
geo_level    | 2   | 67 | 34

My current code:

pd.DataFrame(list(itertools.chain(*df['results'].values.tolist())), columns=['num', 'pct', 'index'])

I'm able to expand and create header but I cannot match back to corresponding column values (i.e. the column_name)

Andy L. · Accepted Answer

Explode column results and assign to df1. Create the new dataframe from list of sublist of df1.results and reset_index

df1 = df.explode('results')
pd.DataFrame(df1.results.tolist(), 
             index=df1.column_name,
             columns=['num', 'pct', 'index']).reset_index()

Out[562]:
    column_name  num  pct  index
0  income_level    0   12     13
1  income_level    0   98     43
2  income_level    1   29     73
3  income_level    2   12     34
4     geo_level    0   78     23
5     geo_level    1   56     67
6     geo_level    2   67     34

On pandas < 0.25, use sum, np.repeat, and reset_index to achieve the same thing

pd.DataFrame(df.results.sum(), 
             index=np.repeat(df.column_name, df.results.str.len()), 
             columns=['num', 'pct', 'index']).reset_index()

Out[572]:
    column_name  num  pct  index
0  income_level    0   12     13
1  income_level    0   98     43
2  income_level    1   29     73
3  income_level    2   12     34
4     geo_level    0   78     23
5     geo_level    1   56     67
6     geo_level    2   67     34

Expand nested lists to rows, create headers, and map back to original columns

Answers (2)

Related Questions