blue-sky
blue-sky

Reputation: 53896

Converting a dataframe to tidy format

Here I'm attempting to convert dataframe to tibble format and split the year , month column values into their own rows :

library(dpylr)
library(tidyr)

res <- data.frame("year.month" = c("2005M1","2005M2","2005M3","2005M4"), "national houses" = c(100,100,100,100), "dublin houses" = c(120,120,120,120))

res %>% separate(year.month , into=c("year" , "month") ,  sep=".")

returns :

  year month national.houses dublin.houses
1                        100           120
2                        100           120
3                        100           120
4                        100           120
Warning message:
Too many values at 4 locations: 1, 2, 3, 4 

year & month values are not appearing, im not utilizing separate correctly ?

Upvotes: 1

Views: 1982

Answers (2)

Gabi
Gabi

Reputation: 1363

I'd guess that just separating year from month would get you to half-tidy. You still have two separate columns that both count houses. One row per observation, one column per variable would require something like this:

res %>% 
  tidyr::gather(key = where, 
                value = houses, 
                -year.month) %>% 
  mutate(where = gsub(where, 
                      pattern = '\\.houses', 
                      replacement = '')) %>% 
  separate(year.month, 
           into = c('year', 'month'), 
           sep = 'M')

Upvotes: 1

acylam
acylam

Reputation: 18701

. is a wild card in regex, and the sep argument in separate takes regex, so your code tries to split year.month by each character, hence the warning of too many values. The following separates your column correctly using positive lookbehind and lookahead:

library(dplyr)
library(tidyr)

res %>% 
  separate(year.month, into = c("year", "month"), sep = "(?<=\\d)(?=M)")

You can also use extract from tidyr to split by capture group:

res %>% 
  extract(year.month, into = c("year", "month"), regex = "(\\d{4})(M\\d)")

Result:

  year month national.houses dublin.houses
1 2005    M1             100           120
2 2005    M2             100           120
3 2005    M3             100           120
4 2005    M4             100           120

Upvotes: 1

Related Questions