cephalopod
cephalopod

Reputation: 1906

how to separate a column with a hyphen

This should be a simple task but I've hit a road block.

How do I split the first column and get two columns with numeric values:

  require(xml2)
  require(rvest)
  require(janitor)
  require(tidyverse)
  require(janitor)
  
  raw_webpage <- read_html("https://www.ato.gov.au/Rates/Schedule-8---Statement-of-formulas-for-calculating-HELP,-SSL,-TSL-and-SFSS-components/")
  
  df1 <- html_table(raw_webpage, fill = TRUE)[[2]]
  
  df1 %>% clean_names() %>% 
    separate(weekly_earnings_x, into = c("one", "two"), sep = "\\-", convert = TRUE)

Upvotes: 2

Views: 1378

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389047

Those are not minus sign (-) except for the last one.

grepl("-", df1$`Weekly earnings (x)$`)
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[14] FALSE FALSE FALSE FALSE FALSE  TRUE

Copy and paste the delimiter from the dataframe in sep argument.

df1 %>% 
  clean_names() %>% 
  separate(weekly_earnings_x, into = c("one", "two"), 
           sep = "\\s*(–|-)\\s*", convert = TRUE)

#   one      two      component_rate_a_percent
#   <chr>    <chr>    <chr>                   
# 1 0        553.99   –                       
# 2 554.00   692.99   1.0                     
# 3 693.00   755.99   2.0                     
# 4 756.00   821.99   2.5                     
# 5 822.00   892.99   3.0                     
# 6 893.00   966.99   3.5                     
# 7 967.00   1,045.99 4.0                     
# 8 1,046.00 1,129.99 4.5                     
# 9 1,130.00 1,218.99 5.0                     
#10 1,219.00 1,312.99 5.5                     
#11 1,313.00 1,412.99 6.0                     
#12 1,413.00 1,518.99 6.5                     
#13 1,519.00 1,630.99 7.0                     
#14 1,631.00 1,749.99 7.5                     
#15 1,750.00 1,873.99 8.0                     
#16 1,874.00 2,009.99 8.5                     
#17 2,010.00 2,150.99 9.0                     
#18 2,151.00 2,300.99 9.5                     
#19 2,301.00 over     10.0       

You may want to use readr::parse_number to turn the columns to numeric since they contain commas in them.

Upvotes: 2

Related Questions