Reputation: 704
I'm trying to extract the table which contain an Arabic poem. You can check the poem in here
I tried to parse the table...
URL <- "http://www.adab.com/modules.php?name=Sh3er&doWhat=shqas&qid=65546&r=&rc=1"
Data <- htmlTreeParse(URL, useInternalNodes = TRUE,encoding = "Windows-1256")
Poem <- xpathSApply(Data,"//p[@class='poem']",xmlValue)
Poem1 <- xpathSApply(Data,"//font[@class='poem']",xmlValue)
Encoding(Poem) <- "UTF-8"
Encoding(Poem1) <- "UTF-8"
But that's not good because i changed the order which poem was written with.
So, Is there a way to get this table using only one code to get it as written in the URL ?
e.g:
Poem <- xpathSApply(Data,"//p[@class='poem']&//font[@class='poem']",xmlValue)
Upvotes: 1
Views: 439
Reputation: 43334
The question is actually about appropriate selectors to grab multiple tags with a class of "poem". There are a few options. A simple option is to use a wildcard character *
for the tag name in the XPath selector:
Poem <- xpathSApply(Data,"//*[@class='poem']",xmlValue)
If you only want p
and font
tags of class "poem"
, but not, say a div
tag of the same class, you can use an |
(or) operator to select multiple options. Translated to rvest
, which I find a little easier to read (though the same selector works fine in xpathSApply
, as well):
library(rvest)
Poem <- URL %>% read_html() %>%
html_nodes(xpath = '//p[@class="poem"] | //font[@class="poem"]') %>%
html_text(trim = TRUE)
Another option if using rvest
is to use CSS selectors instead of XPath ones. In CSS, class is specified by .
, so all you need for a wildcard version is ".poem"
; to limit to only p
or font
tags, use "p.poem, font.poem"
. Here's a fun tutorial on CSS selectors, if you like.
Poem <- URL %>% read_html() %>%
html_nodes(css = '.poem') %>%
html_text(trim = TRUE)
head(Poem, 15) # I don't speak Arabic, so check that the results make sense
## [1] "أقداح و أحلام" "أنا لا أزال و في يدي قدحي" "ياليل أين تفرق الشرب"
## [4] "ما زلت أشربها و أشربها" "حتى ترنح أفقك الرحب" "الشرق عُفر بالضباب فما"
## [7] "يبدو فأين سناك يا غرب؟" "ما للنجوم غرقن ، من سأم" "في ضوئهن و كادت الشهب ؟"
## [10] "أنا لا أزال و في يدي قدحي" "ياليل أين تفرق الشرب ؟" "******"
## [13] "الحان بالشهوات مصطخب" "حتى يكاد بهن ينهار" "و كأن مصاحبيه من ضرج"
Upvotes: 2