Newbie
Newbie

Reputation: 161

How to get text inside script tag in HTML

I am trying to scrape this site:

https://www.lanebryant.com/perfect-sleeve-swing-tunic-top/prd-356831#color/0000009320

I want to get type of clothing, i.e. the category of the clothing. There is a script on the page: enter image description here

How can I collect this text and get the category of the clothing which I have highlighted in the image? I have tried the following code but it returns nothing.

type = d.find_element_by_xpath("//script[@type='text/javascript']").text
print("hiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii"+type)

d here is the driver

Upvotes: 0

Views: 1161

Answers (3)

KunduK
KunduK

Reputation: 33384

Here you go...

1.Get the innerHTML of the scripts tag

2.Convert into Json() format

3.use the parameter and then get the value tops

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import json

driver = webdriver.Chrome()
driver.get('https://www.lanebryant.com/perfect-sleeve-swing-tunic-top/prd-356831')
item = WebDriverWait(driver, 10).until(EC.presence_of_element_located(
    (By.XPATH, "//script[@type='text/javascript'][contains(.,'window.lanebryantDLLite')]"))).get_attribute('innerHTML')
itemtext = item.split("=")[1].split(";")[0]  # This will return as string

itemjson = json.loads(itemtext.strip())  # Converted here into json format

itemtop = itemjson['page']['pageName']  # Use the parameter to get the text

print(itemtop.split(':')[1].strip())  # Split here to get only value tops

Hope this helps.

Upvotes: 1

Mattias
Mattias

Reputation: 436

One of the problems with your current way is that you collect all scripts on the current page, you need to narrow it a bit.

This finds the correct script and then collects the category with the help of regex:

from lxml import html
import requests
import re
# create the regex
category_regex = re.compile(r'(?<="category": ").*(?=", "CategoryID")')
page = requests.get('https://www.lanebryant.com/perfect-sleeve-swing-tunic-top/prd-356831#color/0000009320')
tree = html.fromstring(page.content)
information = tree.xpath("//script[contains(text(), '\"page\": {    \"pageName\": \"Clothing :')]/text()")
print(category_regex.findall(str(information)))

Output: ['Tops']

Upvotes: 0

Arun Augustine
Arun Augustine

Reputation: 1766

try something like this,

type = d.find_element_by_xpath('//script[@type="text/javascript"]').text

Also make a count of script tags in the page source.

Upvotes: 0

Related Questions