Reputation: 13
currently i working with python Dataframes. I am just a beginner and i created a for loop to collect the data from api (json format)and appending it to list by join a string to the list based on each search. finally converting it to Dataframe.
This loop works perfectly fine.Since it has to loop over 1500 enteries, its taking really lot of time. can anyone suggest me best python way to make it fast .?
Thank you very much in advance :)
url = "https:\\api."
team = [abc,def,ghi, ...] # List of more than 1500 entries movie symbols
abcd = list()
for t in team:
status_url = requests.get(f"{url}/{t}")
status_data = status_url.text
status_data_list = list(status_data)
status_data_list.insert(1, f"\"Movie_name\":\"{t}\",")
final_string = ''.join(status_data_list)
parsed = json.loads(final_string)
abcd.append(parsed)
Movie_dataframe = pd.DataFrame(abcd)
Upvotes: 0
Views: 46
Reputation: 6327
The speed loss is not in converting the data to a dataframe. It is the requests.
first, you could change your code slightly to
for t in team:
response = requests.get(f"{url}/{t}")
status_data = response.json()
status_data["Movie_name"] = t
abcd.append(status_data)
However, you can perform the requests asynchronously, which will fetch all of the data at the same time. However your IP might get blacklisted from the website, check the maximum rate at which you can make requests
import asyncio
import httpx
url = "https:\\api."
teams = ["abc","def","ghi"]
async def get_team(team):
async with httpx.AsyncClient() as client:
r = await client.get(f"{url}/{team}")
status_data = r.json()
status_data["Movie_name"] = team
return status_data
loop = asyncio.get_event_loop()
tasks = [get_team(team) for team in teams]
abcd = loop.run_until_complete(asyncio.gather(*tasks))
Movie_dataframe = pd.DataFrame(abcd)
Upvotes: 2