Kalenji
Kalenji

Reputation: 407

Text extract and adding to new column

Hi I am trying to kill two bird with one stone.

Firstly if col b is populated get it to new (no issue here) and secondly if col b is blank extract part of the string - everything after Task and before space and input to "new".

a <- c("11-010 Bla", "TASK 21 MMM", "TASK 03-11-11 Hah")
b <- c("11-010","","")
new <- c("","","")

df <- data.frame(a,b,new)


a                 b          new
11-010 Bla        11-010    
TASK 21 MMM
TASK 03-11-11 Hah

Output:

a                 b          new
11-010 Bla        11-010     11-010   
TASK 21 MMM                  21       
TASK 03-11-11 Hah            03-11-11

I tried to get the task number using below but I am unable to add space to it. The task number is always followed by space.

gsub("^[^_]*TASK|\\.[^.]*\\s$", "", df$a)
sub(".*?TASK=(.*?)' '.*", "\\1", df$a)

Upvotes: 0

Views: 90

Answers (2)

Onyambu
Onyambu

Reputation: 79218

sub("?(.*\\s)?(\\d.*?\\s).*","\\2",a)
[1] "11-010 " "21 " "03-11-11

regmatches(a,regexpr("\\d.*?\\s",a))
[1] "11-010 "   "21 "       "03-11-11 "

Upvotes: 0

Cath
Cath

Reputation: 24074

You can capture, in case b is an empty string everything that is between "TASK " and the space with the following regex:

sub(".*TASK ([^ ]+) .+", "\\1", df$a[df$b==""])
# [1] "21"       "03-11-11"

\\1 permits to capture what is in between brackets in the regex, which, in this case, is [^ ]+: anything but a space, one or more times.

You can put that directly in df with:

df$new[df$b==""] <- sub(".*TASK ([^ ]+) .+", "\\1", df$a[df$b==""])
#                  a      b      new
#1        11-010 Bla 11-010   11-010
#2       TASK 21 MMM              21
#3 TASK 03-11-11 Hah        03-11-11

Upvotes: 2

Related Questions