Reputation: 1321
My dataset looks like this -
dataset = data.frame(Comments=c('Wow... Loved this place. 1','Crust is not good. 0','Not tasty and the texture was just nasty. 0'))
I'm trying to split the dataset into two columns such that the first column contains only the text and the second column contains the only the number at the end of each string.
Here's my attempt
library(dplyr)
library(tidyr)
dataset = dataset %>%
separate(Comments, into = c("Comment", "Score"), sep = " (?=[^ ]+$)")
However I'm not getting a perfect separation. I've looked at other solutions online, but no luck yet.
Any help on this would be greatly appreciated.
Upvotes: 0
Views: 354
Reputation: 1868
One solution would be to take advantage of stringr
functions:
dataset %>%
mutate(Score = str_extract_all(Comments, pattern = "[:digit:]"),
Comments = str_remove_all(Comments, pattern = "[:digit:]") %>% str_trim())
# Comments Score
#1 Wow... Loved this place. 1
#2 Crust is not good. 0
#3 Not tasty and the texture was just nasty. 0
Upvotes: 0
Reputation: 1019
Perhaps you could use substr
and gsub
dataset <- dataset %>%
mutate(Comments = as.character(Comments)) %>%
mutate(Score = substr(Comments, nchar(Comments), nchar(Comments))) %>%
mutate(Comment = gsub("\\s\\d", "", Comments))
Upvotes: 1