Pelide
Pelide

Reputation: 528

How to use pandas DF as params in HTTP request

I have a list of places from an excel file which I would enrich with the geonames Ids. Starting from the excel file I made a pandas Data Frame then I would use the values from the DF as params in my request. Here the script I made

import pandas as pd 


import requests
import json


require_cols = [1] 


required_df = pd.read_excel('grp.xlsx', usecols = require_cols) 

print(required_df)

url = 'http://api.geonames.org/searchJSON?'

params = {  'username': "XXXXXXXX",

            'name_equals': (required_df),

            'maxRows': "1"}

e = requests.get(url, params=params)

pretty_json = json.loads(e.content)
print (json.dumps(pretty_json, indent=2))

The problem is related to the defintion of this parameter:


 'name_equals': (required_df)

I would use the Places (around 15k) from the DF as param and recoursively retrieve the related geonames ID and write the output in a separate excel file.

The simple request works:

import requests
import json

url = 'http://api.geonames.org/searchJSON?'

params = {  'username': "XXXXXXX",

            'name_equals': "Aire",

            'maxRows': "1"}

e = requests.get(url, params=params)

pretty_json = json.loads(e.content)
print (json.dumps(pretty_json, indent=2))
#print(e.content)

As well as the definition of Pandas data frame:

# import pandas lib as pd 
import pandas as pd 

require_cols = [0,1] 

# only read specific columns from an excel file 
required_df = pd.read_excel('grp.xlsx', usecols = require_cols) 

print(required_df) 

I also tried via SPARQL without results so I decided to go via Python.

Thanks for your time.

Upvotes: 0

Views: 836

Answers (2)

Pelide
Pelide

Reputation: 528

Thanks @furas for your reply.

I solved like this:


import pandas as pd 

import requests
import json

url = 'http://api.geonames.org/searchJSON?'

df = pd.read_excel('Book.xlsx', sheet_name='Sheet1', usecols="B")

for item in df.place_name:

    df.place_name.head()

    params ={   'username': "XXXXXX",

                'name_equals': item,

                'maxRows': "1"}

    e = requests.get(url, params=params)

    pretty_json = json.loads(e.content)

    for item in pretty_json["geonames"]:

        print (json.dumps(item["geonameId"], indent=2))

        with open('data.json', 'w', encoding='utf-8') as f:

            json.dump(item["geonameId"], f, ensure_ascii=False, indent=4)       

    #print(e.content)

The only problem now is related to the json output: By print I'm having the complete IDs list however, when I'm going to write the output to a file I'm getting just the last ID from the list.

Upvotes: 0

furas
furas

Reputation: 142919

You can use for-loop

import pandas as pd

df = pd.DataFrame({'Places': ['London', 'Paris', 'Berlin']})

for item in df['Places']:
    print('requests for:', item)
    # ... rest of code ...

or df.apply()

import pandas as pd

def run(item):
    print('requests for:', item)
    # ... rest of code ...
    return 'result for ' + item

df = pd.DataFrame({'Places': ['London', 'Paris', 'Berlin']})

df['Results'] = df['Places'].apply(run)

Upvotes: 1

Related Questions