Reputation: 267
i need the a selector to scrape the value after the euro symbol (\u20ac).
<Selector xpath='//*[@class="col-sm-4"]/text()' data=u'\r\n\t\t \u20ac 30.000,00'>
I tried dozens of variations that i have found here on stackoverflow and elsewere but i cant get it.
Sides like https://regexr.com/ show me that something like this:
response.xpath('//*[@class="col-sm-4"]/text()').re('(\u20ac).\d*.\d*.\d*')
should work, but it doesnt.
EDIT: Here a example link of Data that i would like to scrape: https://www.firmenabc.at/manfred-jungwirth-montagen_MoKY
Would appreciate help!
Michael
Upvotes: 1
Views: 369
Reputation: 1548
Try this:
response.xpath('//*[@class="col-sm-4"]/text()').re(u'\u20ac\s*(\d+[\d\.,]+)')
Upvotes: 0
Reputation: 5642
Here is the regex you are looking for. If you want to match \u20ac
literally you need to prefix it with a \
, the following variant: \u20ac|\\u20ac
will match both € and \u20ac:
(\u20ac|\\u20ac)\s+.\d*.\d*.\d*
Missing was also a \s+
. \s
specifies you want to match a white space, \s+
specifies you want to match multiple white space (notice there is white space between \u20ac and the value, 30.000,00)
Notice though that this will capture only the €
symbol (capture groups are composed of closed parentheses ()
, i.e. (ANYTHING BETWEEN THIS WILL BE CAPTURED)
So I believe what you want is:
\u20ac|\\u20ac\s+(\d*.*)
- Here, we're surrounding .\d*.*
with ()
therefore capturing that value instead of the €
symbol.
Repeating .\d*
is redundant, you already indicated you want to match every ocassion of it by writing it previously: \d
and suffixing it a *
.
Lastly, I recommend you play around with regex using https://www.regex101.com - It's a great tool and will save you a lot of headache.
Upvotes: 1