Reputation: 49
I am working on a project where I am getting JSON data from a GraphQL API. After receiving the data, I am using json.loads() on the data and then accessing parts of the JSON I need, which is then stored in a dictionary containing another dictionary. The dictionary is:
{'placement': 1, 'entrant': {'id': 8554498, 'name': 'Test'}}
{'placement': 2, 'entrant': {'id': 8559863, 'name': 'Test'}}
{'placement': 3, 'entrant': {'id': 8561463, 'name': 'Test'}}
{'placement': 4, 'entrant': {'id': 8559889, 'name': 'Test'}}
{'placement': 5, 'entrant': {'id': 8561608, 'name': 'Test'}}
{'placement': 5, 'entrant': {'id': 8560090, 'name': 'Test'}}
{'placement': 7, 'entrant': {'id': 8561639, 'name': 'Test'}}
{'placement': 7, 'entrant': {'id': 8561822, 'name': 'Test'}}
{'placement': 9, 'entrant': {'id': 8559993, 'name': 'Test'}}
{'placement': 9, 'entrant': {'id': 8561572, 'name': 'Test'}}
How could I create a Pandas Dataframe so that the columns are
placement | id | name
and the values below those columns are the values associated with them from the dictionary? If I use just
pd.DataFrame()
the output is not as desired, so I tried to look up solutions that involved me iterating over the items in the dictionary but I was unsuccessful. Any help would be appreciated. Thank you.
Upvotes: 0
Views: 337
Reputation: 164
You cannot iterate over the df_data1 capturing all dictionaries because of the way json.loads() provides the data. In order to fix the structure of the dictionary as wanted, I suggest you do the following to add commas between the dictionaries by replacing any occurrences of "}{
" with "}, {"
, and surround it with "["
and "]"
. Suppose j
your json string, then:
df_data1 = json.loads("[" + j.replace("}{", "}, {") + "]")
Now your df_data1 should look like this:
[{'placement': 1, 'entrant': {'id': 8554498, 'name': 'Test'}},
{'placement': 2, 'entrant': {'id': 8559863, 'name': 'Test'}},
{'placement': 3, 'entrant': {'id': 8561463, 'name': 'Test'}},
{'placement': 4, 'entrant': {'id': 8559889, 'name': 'Test'}},
{'placement': 5, 'entrant': {'id': 8561608, 'name': 'Test'}},
{'placement': 5, 'entrant': {'id': 8560090, 'name': 'Test'}},
{'placement': 7, 'entrant': {'id': 8561639, 'name': 'Test'}},
{'placement': 7, 'entrant': {'id': 8561822, 'name': 'Test'}},
{'placement': 9, 'entrant': {'id': 8559993, 'name': 'Test'}},
{'placement': 9, 'entrant': {'id': 8561572, 'name': 'Test'}}]
And now you can use @Thomas Q solution:
df= pd.DataFrame([
{"placement": d["placement"], "id": d["entrant"]["id"], "name": d["entrant"]["name"]}
for d in df_data1
])
df
placement id name
0 1 8554498 Test
1 2 8559863 Test
2 3 8561463 Test
3 4 8559889 Test
4 5 8561608 Test
5 5 8560090 Test
6 7 8561639 Test
7 7 8561822 Test
8 9 8559993 Test
9 9 8561572 Test
Upvotes: 1
Reputation: 316
Here is one way to do it by extracting a new DataFrame from inside the first one and merging it:
from itertools import chain
import pandas as pd
data = [
[{"placement": 1, "entrant": {"id": 8554498, "name": "Test"}}],
[{"placement": 2, "entrant": {"id": 8559863, "name": "Test"}}],
[{"placement": 3, "entrant": {"id": 8561463, "name": "Test"}}],
[{"placement": 4, "entrant": {"id": 8559889, "name": "Test"}}],
[{"placement": 5, "entrant": {"id": 8561608, "name": "Test"}}],
[{"placement": 5, "entrant": {"id": 8560090, "name": "Test"}}],
[{"placement": 7, "entrant": {"id": 8561639, "name": "Test"}}],
[{"placement": 7, "entrant": {"id": 8561822, "name": "Test"}}],
[{"placement": 9, "entrant": {"id": 8559993, "name": "Test"}}],
[{"placement": 9, "entrant": {"id": 8561572, "name": "Test"}}],
]
df = pd.DataFrame.from_dict(chain(*data))
result_df = pd.merge_asof(
df.loc[:, df.columns != "entrant"], # Get df without the "entrant" column
df["entrant"].apply(pd.Series), left_index=True, right_index=True
)
Here is the result:
placement id name
0 1 8554498 Test
1 2 8559863 Test
2 3 8561463 Test
3 4 8559889 Test
4 5 8561608 Test
5 5 8560090 Test
6 7 8561639 Test
7 7 8561822 Test
8 9 8559993 Test
9 9 8561572 Test
Upvotes: 1
Reputation: 1725
i would do something like this, maybe is not the most elegant solution, but it works. I asume that you have a list of every dict, cause you proportioned the dicts one by one
dList = [{'placement': 1, 'entrant': {'id': 8554498, 'name': 'Test'}},
{'placement': 2, 'entrant': {'id': 8559863, 'name': 'Test'}},
{'placement': 3, 'entrant': {'id': 8561463, 'name': 'Test'}},
{'placement': 4, 'entrant': {'id': 8559889, 'name': 'Test'}},
{'placement': 5, 'entrant': {'id': 8561608, 'name': 'Test'}},
{'placement': 5, 'entrant': {'id': 8560090, 'name': 'Test'}},
{'placement': 7, 'entrant': {'id': 8561639, 'name': 'Test'}},
{'placement': 7, 'entrant': {'id': 8561822, 'name': 'Test'}},
{'placement': 9, 'entrant': {'id': 8559993, 'name': 'Test'}},
{'placement': 9, 'entrant': {'id': 8561572, 'name': 'Test'}}]
#generate column Names I supose that you dont have writed this names to make more general the problem
d0 = dList[0]
columns = []
for key,val in d0.items():
if not isinstance(val,dict):
columns.append(key)
else:
for subkey,subval in val.items():
columns.append(subkey)
#%% Here we are going to generate de data list (a list with a sublist for every dict
data = []
for d in dList:
thisData = []
for key,val in d.items():
if not isinstance(val,dict):
thisData.append(val)
else:
for subkey,subval in val.items():
thisData.append(subval)
data.append(thisData)
df = pd.DataFrame(data,columns=columns)
hope it works for you, if not let me know
Upvotes: 1
Reputation: 890
You need to create proper dictionaries for pandas to create a dataframe. I'm assuming here you have a list of dicts called dictionaries.
pd.DataFrame(
[
{"placement": d["placement"], "id": d["entrant"]["id"], "name": d["entrant"]["name"]}
for d in dictionaries
]
)
Upvotes: 1