ERIC
ERIC

Reputation: 500

scrape blocks of data from webpage API

I try to collect block data which forms a small table from a webpage. Pls see my codes below.

`

import requests
import re
import json
import sys
import os
import time
from lxml import html,etree
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.investing.com/instruments/OptionsDataAjax'
params = {'pair_id': 525, ## SPX
          'date': 1536555600, ## 2018-9-4
          'strike': 'all', ## all prices
          'callspots': 'calls',#'call_andputs',
          'type':'analysis', # webpage viewer
          'bringData':'true',
          }
headers = {'User-Agent': Chrome/39.0.2171.95 Safari/537.36'}
def R(text, end='\n'): print('\033[0;31m{}\033[0m'.format(text), end=end)
def G(text, end='\n'): print('\033[0;32m{}\033[0m'.format(text), end=end)
page = requests.get(url, params=params,headers = headers)
if page.status_code != 200:
    R('ERROR CODE:{}'.format(page.status_code))
    sys.exit
    G('Problem in connection!')
else:
    G('OK')
soup = BeautifulSoup(page.content,'lxml')
spdata = json.loads(soup.text)
print(spdata['data'])`

This result--spdata['data'] gives me a str, I just want to get following blocks in this str. There are many such data blocks in this str with the same format.

    SymbolSPY180910C00250000
    Delta0.9656
    Imp Vol0.2431
    Bid33.26
    Gamma0.0039
    Theoretical33.06
    Ask33.41
    Theta-0.0381
    Intrinsic Value33.13
    Volume0
    Vega0.0617
    Time Value-33.13
    Open Interest0
    Rho0.1969
    Delta / Theta-25.3172

I use json and BeautifulSoup here, maybe regular expression will help but I don't know much about re. To get the result, any approach is appreciated. Thanks.

Upvotes: 0

Views: 48

Answers (1)

Dan-Dev
Dan-Dev

Reputation: 9430

Add this after your code:

regex = r"((SymbolSPY[1-9]*):?\s*)(.*?)\n[^\S\n]*\n[^\S\n]*"
for match in re.finditer(regex, spdata['data'], re.MULTILINE | re.DOTALL):
    for line in match.group().splitlines():
        print (line.strip())

Outputs

OK
SymbolSPY180910C00245000
Delta0.9682
Imp Vol0.2779
Bid38.26
Gamma0.0032
Theoretical38.05
Ask38.42
Theta-0.0397
Intrinsic Value38.13
Volume0
Vega0.0579
Time Value-38.13
Open Interest0
Rho0.1934
Delta / Theta-24.3966


SymbolSPY180910P00245000
Delta-0.0262
Imp Vol0.2652
...

Upvotes: 1

Related Questions