MrOrangeMan
MrOrangeMan

Reputation: 25

Is there any way to parse value from this HTML code using XPath?

So, I was trying to get some data out of this HTML code:

<span class="info-icon" data-toggle="popover" data-trigger="hover" title="" data-content="
    Рейтинг: <b>4.55/5</b><br/>

      Относительно остальных произведений: <b>3.58/5</b><br/>

    Всего голосов: <b>62</b>
" data-original-title="Информация о рейтинге">
      <i class="fa fa-info-circle"></i>
    </span>

I was trying to get whole text using something like this:

//span[@class='info-icon']/@data-content
//span[@data-content='Рейтинг']
//span/@data-content
//span[@class='info-icon']/@data-content

I want to have an output like this:

4.55/5
3.58/5
62

Or atleast like this:

 Рейтинг: <b>4.55/5</b><br/>

 Относительно остальных произведений: <b>3.58/5</b><br/>

 Всего голосов: <b>62</b>

But I'm not getting anything.

P.s. Website URL can be any manga here: http://readmanga.me/ For example http://readmanga.me/tower_of_god

Upvotes: 1

Views: 151

Answers (2)

player0
player0

Reputation: 1

you will need to scrape the source code directly like:

=ARRAYFORMULA(REGEXREPLACE(REGEXREPLACE(QUERY(ARRAY_CONSTRAIN(IMPORTDATA(
 "http://readmanga.me/tower_of_god"), 2000, 1), 
 "where Col1 matches 'Рейтинг:.*|.*остальных произведений:.*|Всего голосов:.*'", 0), 
 "[А-Яа-я<>br: ]", ), 
 "//$|/$", ))

0

Upvotes: 1

Jack Fleeting
Jack Fleeting

Reputation: 24940

The following xpath expressions should probably work:

tokenize(//span/@data-content,' ')[2]

selects

4.55/5

This one:

substring-before(tokenize(//span/@data-content,'<b>')[3],' ')

selects

3.58/5

and this one:

tokenize(//span/@data-content,'<b>')[4]

selects:

62

Upvotes: 1

Related Questions