Meredith
Meredith

Reputation: 15

Scraping non-table information r

I'm trying to scrape information from this webpage, https://www.ncleg.gov/Laws/GeneralStatuteSections/Chapter14, (the info under the "Chapter 14" tab) and put it into a datafram with two columns in R, but these skills are out of my wheelhouse and I need some help. More specifically, I want one column with the G.S. numbers ("G.S. 14-1", "G.S. 14-1.1", etc.) and one column with the names corresponding to these G.S. numbers ("14.1 Felonies and Misdemeanors Defined", "14-1.1: Repealed by Session Laws 1993, c. 538, s. 2.", etc.). As text and not the links.

I've tried using the selector gadget, but this tool is pretty new to me and I don't really understand how to apply what I do with it in R.

Any advice or tips on how to do this?

Upvotes: 1

Views: 37

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 173888

Yes, this is fairly tricky. I would probably approach it with a combination of xpath and regular expressions:

library(rvest)
#> Loading required package: xml2

page <- read_html("https://www.ncleg.gov/Laws/GeneralStatuteSections/Chapter14")
x1 <- ("//div[@class = 'col-12 col-md-3 col-lg-2 d-flex mobile-font-size-large']")
x2 <- ("//div[@class='col-12 col-md-9 col-lg-10']")

description <- html_nodes(page, xpath = x2) %>% html_text() %>% trimws()

col2 <- gsub("^.*.\\d[A-Z]?(\\.|:) +", "", description)
col1 <- gsub("^(.*.\\d[A-Z]?[\\.|:]) +.*$", "\\1", description)
col1 <- gsub("\u00a7", "GS", col1)

df <- data.frame(section = col1, description = col2)

For ease of printing, I'll show the resulting data frame as a tibble:

tibble::as_tibble(df)

#> # A tibble: 1,059 x 2
#>    section    description                                                       
#>    <chr>      <chr>                                                             
#>  1 GS 14-1.   Felonies and misdemeanors defined.                                
#>  2 GS 14-1.1: Repealed by Session Laws 1993, c.  538, s. 2.                     
#>  3 GS 14-2:   Repealed by Session Laws 1993, c.  538, s. 2.1.                   
#>  4 GS 14-2.1: Repealed by Session Laws 1993, c.  538, s. 3.                     
#>  5 GS 14-2.2: Repealed by Session Laws 2003-0378, s. 1, effective August 1, 200~
#>  6 GS 14-2.3. Forfeiture of gain acquired through criminal activity.            
#>  7 GS 14-2.4. Punishment for conspiracy to commit a felony.                     
#>  8 GS 14-2.5. Punishment for attempt to commit a felony or misdemeanor.         
#>  9 GS 14-2.6. Punishment for solicitation to commit a felony or misdemeanor.    
#> 10 GS 14-3.   Punishment of misdemeanors, infamous offenses, offenses committed~
#> # ... with 1,049 more rows

Created on 2020-09-30 by the reprex package (v0.3.0)

Upvotes: 1

Related Questions