Reputation: 9018
I have a data frame and I would like to parse the "text" column and create a new column that is the number that starts in the 4th position and ends before the 1st underscore. The number will be either 1 or 2 digits. Here is an example:
d = data.frame(group = c("A","b","C"),text =c("DDD10_sdfdsdsfads_","ggg8_dsfsd_","hhh1_dsfdsaf_dsafdafd"))
d
see the new column below that i'd like to create
group text NEW COLUMN ??
1 A DDD10_sdfdsdsfads_ 10
2 b ggg8_dsfsd_ 8
3 C hhh1_dsfdsaf_dsafdafd 1
Thank you.
Upvotes: 0
Views: 87
Reputation: 887058
As the start and end position are known, we can extract using substr
and then remove the _
d$newColumn <- as.numeric(sub("_", "", substr(d$text, 4, 5)))
d$newColumn
#[1] 10 8 1
Or with sub
alone
as.numeric(sub("^.{3}(.{1,2})_.*", "\\1", d$text))
#[1] 10 8 1
Upvotes: 0
Reputation: 1363
Well, here's what I did - not sure if it's the best way, but I referenced Extracting unique numbers from string in R and worked this up.
d = data.frame(group = c("A","b","C"),text =c("DDD10_sdfdsdsfads_","ggg8_dsfsd_","hhh1_dsfdsaf_dsafdafd"))
d$newColumn <- gsub('[^0-9]', '', d$text)
> d
group text newColumn
1 A DDD10_sdfdsdsfads_ 10
2 b ggg8_dsfsd_ 8
3 C hhh1_dsfdsaf_dsafdafd 1
Upvotes: 2