Reputation: 9
I have a list in python which is showing like this just below. I would like to turn it into a data frame. I tried it: pd.DataFrame(myList), however the 'origins' column stores a list, however I would like to store the origin and quantityLeads keys in that same dataframe
myList = [
{
"id":3105052,
"title":"Ebook Relat�rios Gerenciais",
"offering":"Institucional",
"created_date":"2022-06-28"
"inserted_date":"2022-06-28",
"channel":"Social",
"start_date":"2022-06-28",
"end_date":"2022-06-28",
"origins":[
{
"origin":"LinkedIn",
"quantityLeads":"1"
},
{
"origin":"Facebook",
"quantityLeads":"1"
}
]
},
{
"id":3105052,
"title":"Ebook Relat�rios Gerenciais",
"offering":"Institucional",
"inserted_date":"2022-06-28",
"created_date":"2022-06-28",
"channel":"Direct",
"start_date":"2022-06-28",
"end_date":"2022-06-28",
"origins":[
{
"origin":"Desconhecida",
"quantityLeads":"2"
}
]
},
{
"id":2918513,
"title":"Ebook Direct To Consumer",
"offering":"Supply Chain",
"created_date":"2022-06-28",
"inserted_date":"2022-06-28",
"channel":"Social",
"start_date":"2022-06-28",
"end_date":"2022-06-28",
"origins":[
{
"origin":"LinkedIn",
"quantityLeads":"1"
}
]
}
]
Upvotes: 0
Views: 74
Reputation: 9
It's working for me.
df = pd.DataFrame(myList)
df = df.explode('origins')
df['origin'] = df.origins.str.get('origin')
df['quantityLeads'] = df.origins.str.get('quantityLeads')
df.drop('origins', axis=1, inplace=True)
Upvotes: 0
Reputation: 157
In the pursuit of simplicity you could just flatten the dictionary structures with something like:
for row in myList:
row["origin"] = row["origins"][0]["origin"]
row["quantityLeads"] = row["origins"][0]["quantityLeads"]
del row["origins"]
df = pd.DataFrame(myList)
print(df)
Output:
id title offering created_date inserted_date channel start_date end_date origin quantityLeads
0 3105052 Ebook Relat�rios Gerenciais Institucional 2022-06-28 2022-06-28 Social 2022-06-28 2022-06-28 LinkedIn 1
1 3105052 Ebook Relat�rios Gerenciais Institucional 2022-06-28 2022-06-28 Direct 2022-06-28 2022-06-28 Desconhecida 2
2 2918513 Ebook Direct To Consumer Supply Chain 2022-06-28 2022-06-28 Social 2022-06-28 2022-06-28 LinkedIn 1
Just as a side note, for the myList sample above there is a missing comma after the first entry's created_date that's causing an error.
EDIT: If there are a variable number of items in the origins list, but each item has the same keys then we could iterate over those as well.
for row in myList:
origins_list = row["origins"]
counter = 0
for item in origins_list:
row["origin_" + str(counter)] = item["origin"]
row["quantityLeads_" + str(counter)] = item["quantityLeads"]
counter += 1
del row["origins"]
df = pd.DataFrame(myList)
print(df)
Upvotes: 1
Reputation: 14238
If you have more than one element in "origins" you may first explode, create "origin", "quantityLeads" and then decide what to do with rest of the dataframe.
df = pd.DataFrame(myList)
df = df.explode('origins')
df[['origin', 'quantityLeads']] = pd.DataFrame(df['origins'].tolist())
df.drop('origins', axis=1, inplace=True)
print(df):
id title offering created_date \
0 3105052 Ebook Relat�rios Gerenciais Institucional 2022-06-28
1 3105052 Ebook Relat�rios Gerenciais Institucional 2022-06-28
2 2918513 Ebook Direct To Consumer Supply Chain 2022-06-28
inserted_date channel start_date end_date origin quantityLeads
0 2022-06-28 Social 2022-06-28 2022-06-28 LinkedIn 1
1 2022-06-28 Direct 2022-06-28 2022-06-28 Desconhecida 2
2 2022-06-28 Social 2022-06-28 2022-06-28 LinkedIn 1
Upvotes: 1