Reputation: 11
I have a dataframe with two columns : ID and Demographic_distribution. ID is just a number (ex:123456). demographic_impression has a list for each ID. here's an example of what you get in the demographic_distribution column for one ID :
ID : 123456
Demographic_distribution : [{percentage: 0.000952, age: 25-34, gender: unknown}, {percentage: 0.093621, age: 55-64, gender: male}, {percentage: 0.002856, age: 35-44, gender: unknown}, {percentage: 0.031736, age: 18-24, gender: female}, {percentage: 0.085052, age: 25-34, gender: male}, {percentage: 0.019994, age: 18-24, gender: male}, {percentage: 0.085687, age: 35-44, gender: male}, {percentage: 0.133608, age: 55-64, gender: female}, {percentage: 0.112345, age: 65+, gender: female}, {percentage: 0.000317, age: 18-24, gender: unknown}, {percentage: 0.095208, age: 45-54, gender: female}, {percentage: 0.067598, age: 65+, gender: male}, {percentage: 0.086004, age: 45-54, gender: male}, {percentage: 0.075849, age: 25-34, gender: female}, {percentage: 0.098699, age: 35-44, gender: female}, {percentage: 0.003174, age: 65+, gender: unknown}, {percentage: 0.003174, age: 45-54, gender: unknown}, {percentage: 0.004126, age: 55-64, gender: unknown}]
You can see that there are 5 age groups, 3 genders and many percentages. I would like to split the demographic column into three different columns for each parameters. Let's not forget that these informations are liked to an ID in each row, otherwise it doesn't make sense. I tried .explode, but it didn't work.
Any idea how to do this ?
Upvotes: 0
Views: 106
Reputation: 186
I did something like this give a try once.
import json
import os
import pandas as pd
#if Demographic Distribution is more then you can iterate through loop
Demographic_distribution = [
{"percentage": 0.000952, "age": "25-34", "gender": "unknown"},
{"percentage": 0.093621, "age": "55-64", "gender": "male"},
{"percentage": 0.002856, "age": "35-44", "gender": "unknown"},
{"percentage": 0.031736, "age": "18-24", "gender": "female"},
{"percentage": 0.085052, "age": "25-34", "gender": "male"},
{"percentage": 0.019994, "age": "18-24", "gender": "male"},
{"percentage": 0.085687, "age": "35-44", "gender": "male"}]
df = pd.DataFrame.from_dict(Demographic_distribution)
df['ID'] = 123456
df.to_csv("D:\\Path\\Output.csv",index=False)
Output
Upvotes: 0
Reputation: 195418
Try:
df = df.explode("Demographic_distribution")
df = pd.concat(
[df, df.pop("Demographic_distribution").apply(pd.Series)], axis=1
)
print(df)
Prints:
ID percentage age gender
0 123456 0.000952 25-34 unknown
0 123456 0.093621 55-64 male
0 123456 0.002856 35-44 unknown
0 123456 0.031736 18-24 female
0 123456 0.085052 25-34 male
0 123456 0.019994 18-24 male
0 123456 0.085687 35-44 male
0 123456 0.133608 55-64 female
0 123456 0.112345 65+ female
0 123456 0.000317 18-24 unknown
0 123456 0.095208 45-54 female
0 123456 0.067598 65+ male
0 123456 0.086004 45-54 male
0 123456 0.075849 25-34 female
0 123456 0.098699 35-44 female
0 123456 0.003174 65+ unknown
0 123456 0.003174 45-54 unknown
0 123456 0.004126 55-64 unknown
df
used:
ID Demographic_distribution
0 123456 [{'percentage': 0.000952, 'age': '25-34', 'gender': 'unknown'}, {'percentage': 0.093621, 'age': '55-64', 'gender': 'male'}, {'percentage': 0.002856, 'age': '35-44', 'gender': 'unknown'}, {'percentage': 0.031736, 'age': '18-24', 'gender': 'female'}, {'percentage': 0.085052, 'age': '25-34', 'gender': 'male'}, {'percentage': 0.019994, 'age': '18-24', 'gender': 'male'}, {'percentage': 0.085687, 'age': '35-44', 'gender': 'male'}, {'percentage': 0.133608, 'age': '55-64', 'gender': 'female'}, {'percentage': 0.112345, 'age': '65+', 'gender': 'female'}, {'percentage': 0.000317, 'age': '18-24', 'gender': 'unknown'}, {'percentage': 0.095208, 'age': '45-54', 'gender': 'female'}, {'percentage': 0.067598, 'age': '65+', 'gender': 'male'}, {'percentage': 0.086004, 'age': '45-54', 'gender': 'male'}, {'percentage': 0.075849, 'age': '25-34', 'gender': 'female'}, {'percentage': 0.098699, 'age': '35-44', 'gender': 'female'}, {'percentage': 0.003174, 'age': '65+', 'gender': 'unknown'}, {'percentage': 0.003174, 'age': '45-54', 'gender': 'unknown'}, {'percentage': 0.004126, 'age': '55-64', 'gender': 'unknown'}]
Upvotes: 0