Varun
Varun

Reputation: 1321

R Separate column based on pattern

My dataset looks like this -

dataset = data.frame(Comments=c('Wow... Loved this place.   1','Crust is not good.  0','Not tasty and the texture was just nasty.   0'))

I'm trying to split the dataset into two columns such that the first column contains only the text and the second column contains the only the number at the end of each string.

Here's my attempt

library(dplyr)
library(tidyr)

dataset = dataset %>%
  separate(Comments, into = c("Comment", "Score"), sep = " (?=[^ ]+$)")

However I'm not getting a perfect separation. I've looked at other solutions online, but no luck yet.

Any help on this would be greatly appreciated.

Upvotes: 0

Views: 354

Answers (2)

OTStats
OTStats

Reputation: 1868

One solution would be to take advantage of stringr functions:

dataset %>% 
  mutate(Score = str_extract_all(Comments, pattern = "[:digit:]"), 
         Comments = str_remove_all(Comments, pattern = "[:digit:]") %>% str_trim())

#                                   Comments Score
#1                  Wow... Loved this place.     1
#2                        Crust is not good.     0
#3 Not tasty and the texture was just nasty.     0

Upvotes: 0

bjorn2bewild
bjorn2bewild

Reputation: 1019

Perhaps you could use substr and gsub

dataset <- dataset %>%
  mutate(Comments = as.character(Comments)) %>%
  mutate(Score = substr(Comments, nchar(Comments), nchar(Comments))) %>%
  mutate(Comment = gsub("\\s\\d", "", Comments))

Upvotes: 1

Related Questions