Reputation: 65
I have a dataframe like this
id <-c("1","2","3")
col <- c("hello, my 5 year old son is joe, 76","hello world, 55","can't say I didn't, 3")
df <- data.frame(id,col)
I am hoping to divide col into only two columns, one that takes only the numbers after the comma (but no other number) and the other takes the response. So my desired output is:
id text nunber
1 hello, my 5 year old son is joe. 76
2 hello world 55
3 can't say I didn't 3
I've tried:
separate(col, into=c("text","number"), ",(?=[^_]+$)")
but it obviously cuts the text with the comma before.
Any suggestions?
Upvotes: 0
Views: 518
Reputation: 887511
We can use separate
with a regex lookaround to match the ,
followed by zero or more spaces (\\s*
) and one or more digits at the end ($
) of the string inside the lookaround
library(dplyr)
library(tidyr)
df %>%
separate(col, into = c('text', 'number'), ',\\s*(?=[0-9]+$)', convert = TRUE)
-output
id text number
1 1 hellow, my 5 year old son is joe 76
2 2 hello world 55
3 3 can't say I didn't 3
Upvotes: 1
Reputation: 11594
using extract
:
df %>% extract(col = 'col', into=c("text","number"), regex = '(.*),\\s(\\d+$)')
id text number
1 1 hellow, my 5 year old son is joe 76
2 2 hello world 55
3 3 can't say I didn't 3
Upvotes: 2