Larsson
Larsson

Reputation: 39

Extract a data value from page source using BeautifulSoup

When I view a page source I am trying to extract the following data from the site using BeautifulSoup but I am unable to locate it using soup so am looking for some guidance.

When I view the source the page displays the following text.

var = 'SynchronizerToken';
var = 'dd3a0c31e365c458d2d3e68e3c98f772bd2103eccf381';

The code I am using now is

SynchronizerToken = soup.find_all("VAR SYNCHRONIZER_TOKEN_VALUE")

Advice is appreciated, thanks again!

Upvotes: 1

Views: 246

Answers (2)

dot.Py
dot.Py

Reputation: 5157

You can use the following regex pattern to find the wanted value:

SYNCHRONIZER_TOKEN_VALUE = \'(.*?)\'

Regex101

Upvotes: 0

falsetru
falsetru

Reputation: 369334

Using regular expression capturing group:

var SYNCHRONIZER_TOKEN_VALUE = '(.+?)'

, you can get the captured group using <MatchObject>.group(1)


import re

html = '''
var SYNCHRONIZER_TOKEN_NAME = 'SynchronizerToken';
var SYNCHRONIZER_TOKEN_VALUE = 'dd3a0c31e365c458d2d3e68e3c98f772bd2103eccf38163e10ce039c2b70a61a';
'''

token = None
matched = re.search(r"var SYNCHRONIZER_TOKEN_VALUE = '(.+?)'", html)
if matched:
    token = matched.group(1)

# token => 'dd3a0c31e365c458d2d3e68e3c98f772bd2103eccf38163e10ce039c2b70a61a'

Upvotes: 1

Related Questions