Reputation: 37
I'm trying to make multiple API calls to retrieve JSON files. The JSONs all follow the same schema. I want to merge all the JSON files together as one file so I can do two things:
1) Extract all the IP addresses from the JSON to work with later 2) Convert the JSON into a Pandas Dataframe
When I first wrote the code, I made a single request and it returned a JSON that I could work with. Now I have used a for loop to collect multiple JSONs and append them to a list called results_list
so that the next JSON does not overwrite the previous one I requested.
Here's the code
headers = {
'Accept': 'application/json',
'key': 'MY_API_KEY'
}
query_type = 'QUERY_TYPE'
locations_list = ['London', 'Amsterdam', 'Berlin']
results_list = []
for location in locations_list:
url = ('https://API_URL' )
r = requests.get(url, params={'query':str(query_type)+str(location)}, headers = headers)
results_list.append(r)
with open('my_search_results.json' ,'w') as outfile:
json.dump(results_list, outfile)
The JSON file my_search_results.json
has a separate row for each API query e.g. 0 is London, 1 is Amsterdam, 2 is Berlin etc. Like this:
[
{
"complete": true,
"count": 51,
"data": [
{
"actor": "unknown",
"classification": "malicious",
"cve": [],
"first_seen": "2020-03-11",
"ip": "1.2.3.4",
"last_seen": "2020-03-28",
"metadata": {
"asn": "xxxxx",
"category": "isp",
"city": "London",
"country": "United Kingdom",
"country_code": "GB",
"organization": "British Telecommunications PLC",
"os": "Linux 2.2-3.x",
"rdns": "xxxx",
"tor": false
},
"raw_data": {
"ja3": [],
"scan": [
{
"port": 23,
"protocol": "TCP"
},
{
"port": 81,
"protocol": "TCP"
}
],
"web": {}
},
"seen": true,
"spoofable": false,
"tags": [
"some tag",
]
}
(I've redacted any sensitive data. There is a separate row in the JSON for each API request, representing each city, but it's too big to show here)
Now I want to go through the JSON and pick out all the IP addresses:
for d in results_list['data']:
ips = (d['ip'])
print(ips)
However this gives the error:
TypeError: list indices must be integers or slices, not str
When I was working with a single JSON from a single API request this worked fine, but now it seems like either the JSON is not formatted properly or Python is seeing my big JSON as a list and not a dictionary, even though I used json.dump()
on results_list
earlier in the script. I'm sure it has to do with the way I had to take all the API calls and append them to a list but I can't work out where I'm going wrong.
I'm struggling to figure out how to pick out the IP addresses or if there is just a better way to collect and merge multiple JSONs. Any advice appreciated.
Upvotes: 0
Views: 644
Reputation: 5202
To get the IP try:
for d in results_list['data']: #this works only if you accessed data rightly..
ips = (d[0]['ip'])
print(ips)
The key
value of data is a list which contains a dictionary of the ip
you need. So when you try to access ip by ips = (d['ip'])
, you are indexing the outer list, which raises the error:
TypeError: list indices must be integers or slices, not str
So if:
results_list= [
{
"complete": True,
"count": 51,
"data": [
{
"actor": "unknown",
"classification": "malicious",
"cve": [],
"first_seen": "2020-03-11",
"ip": "1.2.3.4",
"last_seen": "2020-03-28",
"metadata": {
"asn": "xxxxx",
"category": "isp",
"city": "London",
"country": "United Kingdom",
"country_code": "GB",
"organization": "British Telecommunications PLC",
"os": "Linux 2.2-3.x",
"rdns": "xxxx",
"tor": False
},
"raw_data": {
"ja3": [],
"scan": [
{
"port": 23,
"protocol": "TCP"
},
{
"port": 81,
"protocol": "TCP"
}
],
"web": {}
},
"seen": True,
"spoofable": False,
"tags": [
"some tag",
]
}...(here is your rest data)
]}]
to get all IP addresses, run:
ip_address=[]
# this works only if each result is a seperate dictionary in the results_list
for d in results_list:
ips = d['data'][0]['ip']
ip_address.append(ips)
print(ips)
#if all results are within data
for d in results_list[0]['data']:
ips = d['ip']
ip_address.append(ips)
print(ips)
Upvotes: 1
Reputation: 4481
results_list
is a list, not a dictionary, so results_list['data']
raises an error. Instead, you should get each dictionary from that list, then access the 'data'
attribute. Noting also that the value for the key 'data'
is of type list, you also need to access the element of that list:
for result in results_list:
for d in result["data"]:
ips = d["ip"]
print(ips)
If you know that your JSON list only has one element, you may simplify this to:
for d in results_list[0]["data"]:
ips = d["ip"]
print(ips)
Upvotes: 0