uspowpow
uspowpow

Reputation: 465

Read local HTML file into R

I have a file on my desktop that's an HTML file. (In chrome, I right-clicked on the web page, chose "save-as" and then "Webpage, HTML"). How can I read this local file into R? Once in R I'm going to need to write some regular expressions to parse the strings and extract certain values.

Upvotes: 16

Views: 31991

Answers (3)

bathyscapher
bathyscapher

Reputation: 2319

Another possibility is htmltools's includehtml():

rawHTML <- includeHTML('path/to/file.html')

class(rawHTML)
[1] "html"      "character"

Upvotes: 2

GGAnderson
GGAnderson

Reputation: 2210

Today, a better (and faster) approach is to use xml2::read_html which is included in the tidyverse, and can read html content from either a local file or URL.

library(xml2)
rawHTML <- read_html(x = "path/to/file.html")

Because this function can read html content from either a local file or URL, it offers input flexibility for automation built on the rvest library for html extraction.

Upvotes: 1

Ricardo Saporta
Ricardo Saporta

Reputation: 55380

use readLines as follows

 rawHTML <- paste(readLines("path/to/file.html"), collapse="\n")

Upvotes: 30

Related Questions