Reputation: 25
So, I was trying to get some data out of this HTML code:
<span class="info-icon" data-toggle="popover" data-trigger="hover" title="" data-content="
Рейтинг: <b>4.55/5</b><br/>
Относительно остальных произведений: <b>3.58/5</b><br/>
Всего голосов: <b>62</b>
" data-original-title="Информация о рейтинге">
<i class="fa fa-info-circle"></i>
</span>
I was trying to get whole text using something like this:
//span[@class='info-icon']/@data-content
//span[@data-content='Рейтинг']
//span/@data-content
//span[@class='info-icon']/@data-content
I want to have an output like this:
4.55/5
3.58/5
62
Or atleast like this:
Рейтинг: <b>4.55/5</b><br/>
Относительно остальных произведений: <b>3.58/5</b><br/>
Всего голосов: <b>62</b>
But I'm not getting anything.
P.s. Website URL can be any manga here: http://readmanga.me/ For example http://readmanga.me/tower_of_god
Upvotes: 1
Views: 151
Reputation: 1
you will need to scrape the source code directly like:
=ARRAYFORMULA(REGEXREPLACE(REGEXREPLACE(QUERY(ARRAY_CONSTRAIN(IMPORTDATA(
"http://readmanga.me/tower_of_god"), 2000, 1),
"where Col1 matches 'Рейтинг:.*|.*остальных произведений:.*|Всего голосов:.*'", 0),
"[А-Яа-я<>br: ]", ),
"//$|/$", ))
Upvotes: 1
Reputation: 24940
The following xpath expressions should probably work:
tokenize(//span/@data-content,' ')[2]
selects
4.55/5
This one:
substring-before(tokenize(//span/@data-content,'<b>')[3],' ')
selects
3.58/5
and this one:
tokenize(//span/@data-content,'<b>')[4]
selects:
62
Upvotes: 1