PyRar
PyRar

Reputation: 549

scrapy - How to retrieve a variable value using regex

I want to retrieve the value of the var modelCode. I made a regex function like this, but it doesn't work at all. I've posted the structure of the page below.Can somebody help me, please?

regex2 = re.compile(r'"var modelCode"\s*:\s*(.+?\})', re.DOTALL)
source_json3 = response.xpath("//script[contains(., 'if(pageTrackName == 'product detail' || pageTrackName == 'generic product details')')]/text()").re_first(regex2)
source_json3 = re.sub(r'//[^\n]+', "", source_json3)

Structure of the page:

var pageTrackName = digitalData.page.pageInfo.pageTrack;
if(pageTrackName == "product detail" || pageTrackName == "generic product details"){ 
   var modelCode = "GT-P5100TSABTU";
   var displayName = "Galaxy Tab 2 (10.1, 3G)".replace(/(<([^>]+)>)/gi, "");
   digitalData.product.model_code = modelCode;
   digitalData.product.displayName = displayName;
   pageName += ":" + modelCode;

}

Upvotes: 5

Views: 877

Answers (2)

Thiago Curvelo
Thiago Curvelo

Reputation: 3740

That code is inside a <script> tag, I suppose. In that case, you could use:

model_code = response.xpath('//script').re_first('modelCode.*?"(.*)"')

Some tips:

  • You don't have to compile the regex to use .re_first()/.re().
  • If you use parentheses it will return just the match group inside them.
  • More info about parsel (scrapy's libraty to extract data from xml): https://parsel.readthedocs.io/en/latest/usage.html

Upvotes: 3

Matt.G
Matt.G

Reputation: 3609

Try Regex: (?<=var modelCode = ")(.+)(?=";)

Demo

we need not do the re.sub as we get the value of modelCode as the match.

Upvotes: 0

Related Questions