Reputation: 53896
Here I'm attempting to convert dataframe to tibble format and split the year , month column values into their own rows :
library(dpylr)
library(tidyr)
res <- data.frame("year.month" = c("2005M1","2005M2","2005M3","2005M4"), "national houses" = c(100,100,100,100), "dublin houses" = c(120,120,120,120))
res %>% separate(year.month , into=c("year" , "month") , sep=".")
returns :
year month national.houses dublin.houses
1 100 120
2 100 120
3 100 120
4 100 120
Warning message:
Too many values at 4 locations: 1, 2, 3, 4
year & month values are not appearing, im not utilizing separate correctly ?
Upvotes: 1
Views: 1982
Reputation: 1363
I'd guess that just separating year from month would get you to half-tidy. You still have two separate columns that both count houses. One row per observation, one column per variable would require something like this:
res %>%
tidyr::gather(key = where,
value = houses,
-year.month) %>%
mutate(where = gsub(where,
pattern = '\\.houses',
replacement = '')) %>%
separate(year.month,
into = c('year', 'month'),
sep = 'M')
Upvotes: 1
Reputation: 18701
.
is a wild card in regex, and the sep
argument in separate
takes regex, so your code tries to split year.month
by each character, hence the warning of too many values. The following separates your column correctly using positive lookbehind and lookahead:
library(dplyr)
library(tidyr)
res %>%
separate(year.month, into = c("year", "month"), sep = "(?<=\\d)(?=M)")
You can also use extract
from tidyr
to split by capture group:
res %>%
extract(year.month, into = c("year", "month"), regex = "(\\d{4})(M\\d)")
Result:
year month national.houses dublin.houses
1 2005 M1 100 120
2 2005 M2 100 120
3 2005 M3 100 120
4 2005 M4 100 120
Upvotes: 1