technophile_3
technophile_3

Reputation: 521

Scrape complete information of all companies using API, requests in python

I have a doubt and I am not getting answers for it. Using this API How can I get the information for All companies which are displayed here The python client example which has been shown on API asks for a company symbol(Which we can get after scraping the links from the SPAC column. But how do I get information for all the companies and just not a single company? Here's the sample python client example which can get the information for a single company

import requests

def get_company(root_symbol):
    end_point = "https://www.spacresearch.com/api/v1/company/{}".format(root_symbol)
    print("Connecting to: ", end_point)
    token = "Bearer {}".format(YOUR_API_TOKEN)
    return requests.get(end_point, headers={"Authorization": token})

response_json = get_company("ANZUU").json()
print(response_json)

Is there anyway I can get information for all companies and just not one? Please help!

EDIT: posting the complete stacktrace for the @demouser's answer:

Connecting to:  https://www.spacresearch.com/api/v1/company/Magnum Opus Acq Ltd
Traceback (most recent call last):
 File "data_ext.py", line 43, in <module>
   response_json = get_company(listofcompanies[i]).json()
 File "F:\proj\venv\lib\site-packages\requests\models.py", line 910, in json
   return complexjson.loads(self.text, **kwargs)
 File "C:\Users\technophile\AppData\Local\Programs\Python\Python38\lib\json\__init__.py", line 357, in loads
   return _default_decoder.decode(s)
 File "C:\Users\technophile\AppData\Local\Programs\Python\Python38\lib\json\decoder.py", line 337, in decode
   obj, end = self.raw_decode(s, idx=_w(s, 0).end())
 File "C:\Users\technophile\AppData\Local\Programs\Python\Python38\lib\json\decoder.py", line 355, in raw_decode
   raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Upvotes: 0

Views: 311

Answers (1)

demouser123
demouser123

Reputation: 4264

So, for this

  1. Open the URL for companies using Selenium
  2. Wait for the table of companies to show up
  3. Get the list of all companies in a list.
  4. Use the list in the get_company method to get the response.

This is the code that I think so should work, provided you give the correct access token

import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
import  requests

svc=Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=svc)
driver.maximize_window()
driver.get("https://www.spacresearch.com/symbol?s=live-deal&sector=&geography=")

wait = WebDriverWait(driver, 20)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'tbody tr')))   ## wait for the table of company names to come up

listofcompanies= []

get_all_company_rows = driver.find_elements(By.TAG_NAME,'tbody tr')     ## all company names are within the tbody tr tag 
print(len(get_all_company_rows))
time.sleep(5)
for i in range(len(get_all_company_rows)):
   ## within the tbody tr , the name of company is inside the first td's div tag 
    nameofcompanies = get_all_company_rows[i].find_element(By.CSS_SELECTOR,'td:nth-child(3)').text   
   
    listofcompanies.append(nameofcompanies)

print(listofcompanies)  ##this prints out all companies from table


for i in range(len(listofcompanies)):
    response_json = get_company(listofcompanies[i]).json()
    print(response_json)

def get_company(root_symbol):
    end_point = "https://www.spacresearch.com/api/v1/company/{}".format(root_symbol)
    print("Connecting to: ", end_point)
    token = "Bearer {}".format('YOUR_API_TOKEN')
    return requests.get(end_point, headers={"Authorization": token})

driver.quit()

Note : The code uses latest beta version of Selenium 4 selenium==4.0.0b4 for this. So you may see a couple of changes from earlier Selenium 3.

Edit 1 - Since the request takes the Ticker symbol instead of Company name, so changed the locator for getting the details from table.

Upvotes: 1

Related Questions