noradioactive
noradioactive

Reputation: 11

Explode list into columns in a dataframe

I have a dataframe with two columns : ID and Demographic_distribution. ID is just a number (ex:123456). demographic_impression has a list for each ID. here's an example of what you get in the demographic_distribution column for one ID :

ID : 123456

Demographic_distribution : [{percentage: 0.000952, age: 25-34, gender: unknown}, {percentage: 0.093621, age: 55-64, gender: male}, {percentage: 0.002856, age: 35-44, gender: unknown}, {percentage: 0.031736, age: 18-24, gender: female}, {percentage: 0.085052, age: 25-34, gender: male}, {percentage: 0.019994, age: 18-24, gender: male}, {percentage: 0.085687, age: 35-44, gender: male}, {percentage: 0.133608, age: 55-64, gender: female}, {percentage: 0.112345, age: 65+, gender: female}, {percentage: 0.000317, age: 18-24, gender: unknown}, {percentage: 0.095208, age: 45-54, gender: female}, {percentage: 0.067598, age: 65+, gender: male}, {percentage: 0.086004, age: 45-54, gender: male}, {percentage: 0.075849, age: 25-34, gender: female}, {percentage: 0.098699, age: 35-44, gender: female}, {percentage: 0.003174, age: 65+, gender: unknown}, {percentage: 0.003174, age: 45-54, gender: unknown}, {percentage: 0.004126, age: 55-64, gender: unknown}]

You can see that there are 5 age groups, 3 genders and many percentages. I would like to split the demographic column into three different columns for each parameters. Let's not forget that these informations are liked to an ID in each row, otherwise it doesn't make sense. I tried .explode, but it didn't work.

Any idea how to do this ?

Upvotes: 0

Views: 106

Answers (3)

Abdul
Abdul

Reputation: 186

I did something like this give a try once.

import json
import os
import pandas as pd

#if Demographic Distribution is more then you can iterate through loop

Demographic_distribution = [
    {"percentage": 0.000952, "age": "25-34", "gender": "unknown"}, 
    {"percentage": 0.093621, "age": "55-64", "gender": "male"}, 
    {"percentage": 0.002856, "age": "35-44", "gender": "unknown"}, 
    {"percentage": 0.031736, "age": "18-24", "gender": "female"}, 
    {"percentage": 0.085052, "age": "25-34", "gender": "male"}, 
    {"percentage": 0.019994, "age": "18-24", "gender": "male"}, 
    {"percentage": 0.085687, "age": "35-44", "gender": "male"}]

df = pd.DataFrame.from_dict(Demographic_distribution)
df['ID'] = 123456

df.to_csv("D:\\Path\\Output.csv",index=False)

Output

enter image description here

Upvotes: 0

Andrej Kesely
Andrej Kesely

Reputation: 195418

Try:

df = df.explode("Demographic_distribution")

df = pd.concat(
    [df, df.pop("Demographic_distribution").apply(pd.Series)], axis=1
)
print(df)

Prints:

       ID  percentage    age   gender
0  123456    0.000952  25-34  unknown
0  123456    0.093621  55-64     male
0  123456    0.002856  35-44  unknown
0  123456    0.031736  18-24   female
0  123456    0.085052  25-34     male
0  123456    0.019994  18-24     male
0  123456    0.085687  35-44     male
0  123456    0.133608  55-64   female
0  123456    0.112345    65+   female
0  123456    0.000317  18-24  unknown
0  123456    0.095208  45-54   female
0  123456    0.067598    65+     male
0  123456    0.086004  45-54     male
0  123456    0.075849  25-34   female
0  123456    0.098699  35-44   female
0  123456    0.003174    65+  unknown
0  123456    0.003174  45-54  unknown
0  123456    0.004126  55-64  unknown

df used:

       ID                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Demographic_distribution
0  123456  [{'percentage': 0.000952, 'age': '25-34', 'gender': 'unknown'}, {'percentage': 0.093621, 'age': '55-64', 'gender': 'male'}, {'percentage': 0.002856, 'age': '35-44', 'gender': 'unknown'}, {'percentage': 0.031736, 'age': '18-24', 'gender': 'female'}, {'percentage': 0.085052, 'age': '25-34', 'gender': 'male'}, {'percentage': 0.019994, 'age': '18-24', 'gender': 'male'}, {'percentage': 0.085687, 'age': '35-44', 'gender': 'male'}, {'percentage': 0.133608, 'age': '55-64', 'gender': 'female'}, {'percentage': 0.112345, 'age': '65+', 'gender': 'female'}, {'percentage': 0.000317, 'age': '18-24', 'gender': 'unknown'}, {'percentage': 0.095208, 'age': '45-54', 'gender': 'female'}, {'percentage': 0.067598, 'age': '65+', 'gender': 'male'}, {'percentage': 0.086004, 'age': '45-54', 'gender': 'male'}, {'percentage': 0.075849, 'age': '25-34', 'gender': 'female'}, {'percentage': 0.098699, 'age': '35-44', 'gender': 'female'}, {'percentage': 0.003174, 'age': '65+', 'gender': 'unknown'}, {'percentage': 0.003174, 'age': '45-54', 'gender': 'unknown'}, {'percentage': 0.004126, 'age': '55-64', 'gender': 'unknown'}]

Upvotes: 0

el_oso
el_oso

Reputation: 1061

json.normalize is what you are after.

pandas docs

Upvotes: 2

Related Questions