HTML list to R dataframe

Question

I have the following example html code below. I would like to have a dataframe as follows. Thank you very much for any ideas

Ingredient

X1
a
b
c


X2
a
b


X3
c
b






column A    column B
   X1          a
   X1          b
   X1          c
   X2          a
   X2          b
   X3          c
   X3          b

Maurits Evers · Accepted Answer

I'm sure this can be optimised but here is an rvest option using some CSS selectors to extract the nested li elements from within the uls.

library(rvest)
library(tidyverse)

val <- read_html(ss) %>%
    html_nodes(css = "li > ul") %>%
    map(~html_nodes(.x, css = "li") %>% html_text())

nms <- read_html(ss) %>%
    html_nodes(css = "li") %>%
    html_text() %>%
    str_extract("X\d") %>%
    na.omit()

stack(setNames(val, nms))
#  values ind
#1      a  X1
#2      b  X1
#3      c  X1
#4      a  X2
#5      b  X2
#6      c  X3
#7      b  X3

Sample data

ss <- 'Ingredient

X1
a
b
c


X2
a
b


X3
c
b




'

HTML list to R dataframe

Answers (2)

Sample data

Related Questions