user3022875
user3022875

Reputation: 9018

create new column of parsed text

I have a data frame and I would like to parse the "text" column and create a new column that is the number that starts in the 4th position and ends before the 1st underscore. The number will be either 1 or 2 digits. Here is an example:

d = data.frame(group = c("A","b","C"),text =c("DDD10_sdfdsdsfads_","ggg8_dsfsd_","hhh1_dsfdsaf_dsafdafd"))
d

see the new column below that i'd like to create

  group                  text     NEW COLUMN ??
1     A    DDD10_sdfdsdsfads_          10
2     b           ggg8_dsfsd_           8
3     C hhh1_dsfdsaf_dsafdafd           1

Thank you.

Upvotes: 0

Views: 87

Answers (2)

akrun
akrun

Reputation: 887058

As the start and end position are known, we can extract using substr and then remove the _

 d$newColumn <- as.numeric(sub("_", "", substr(d$text, 4, 5)))
 d$newColumn
 #[1] 10  8  1

Or with sub alone

as.numeric(sub("^.{3}(.{1,2})_.*", "\\1", d$text))
#[1] 10  8  1

Upvotes: 0

Sam
Sam

Reputation: 1363

Well, here's what I did - not sure if it's the best way, but I referenced Extracting unique numbers from string in R and worked this up.

d = data.frame(group = c("A","b","C"),text =c("DDD10_sdfdsdsfads_","ggg8_dsfsd_","hhh1_dsfdsaf_dsafdafd"))

d$newColumn <- gsub('[^0-9]', '', d$text)

> d
  group                  text newColumn
1     A    DDD10_sdfdsdsfads_        10
2     b           ggg8_dsfsd_         8
3     C hhh1_dsfdsaf_dsafdafd         1

Upvotes: 2

Related Questions