Reputation: 1906
This should be a simple task but I've hit a road block.
How do I split the first column and get two columns with numeric values:
require(xml2)
require(rvest)
require(janitor)
require(tidyverse)
require(janitor)
raw_webpage <- read_html("https://www.ato.gov.au/Rates/Schedule-8---Statement-of-formulas-for-calculating-HELP,-SSL,-TSL-and-SFSS-components/")
df1 <- html_table(raw_webpage, fill = TRUE)[[2]]
df1 %>% clean_names() %>%
separate(weekly_earnings_x, into = c("one", "two"), sep = "\\-", convert = TRUE)
Upvotes: 2
Views: 1378
Reputation: 389047
Those are not minus sign (-
) except for the last one.
grepl("-", df1$`Weekly earnings (x)$`)
# [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
#[14] FALSE FALSE FALSE FALSE FALSE TRUE
Copy and paste the delimiter from the dataframe in sep
argument.
df1 %>%
clean_names() %>%
separate(weekly_earnings_x, into = c("one", "two"),
sep = "\\s*(–|-)\\s*", convert = TRUE)
# one two component_rate_a_percent
# <chr> <chr> <chr>
# 1 0 553.99 –
# 2 554.00 692.99 1.0
# 3 693.00 755.99 2.0
# 4 756.00 821.99 2.5
# 5 822.00 892.99 3.0
# 6 893.00 966.99 3.5
# 7 967.00 1,045.99 4.0
# 8 1,046.00 1,129.99 4.5
# 9 1,130.00 1,218.99 5.0
#10 1,219.00 1,312.99 5.5
#11 1,313.00 1,412.99 6.0
#12 1,413.00 1,518.99 6.5
#13 1,519.00 1,630.99 7.0
#14 1,631.00 1,749.99 7.5
#15 1,750.00 1,873.99 8.0
#16 1,874.00 2,009.99 8.5
#17 2,010.00 2,150.99 9.0
#18 2,151.00 2,300.99 9.5
#19 2,301.00 over 10.0
You may want to use readr::parse_number
to turn the columns to numeric since they contain commas in them.
Upvotes: 2