Text extract and adding to new column

Question

Hi I am trying to kill two bird with one stone.

Firstly if col b is populated get it to new (no issue here) and secondly if col b is blank extract part of the string - everything after Task and before space and input to "new".

a <- c("11-010 Bla", "TASK 21 MMM", "TASK 03-11-11 Hah")
b <- c("11-010","","")
new <- c("","","")

df <- data.frame(a,b,new)


a                 b          new
11-010 Bla        11-010    
TASK 21 MMM
TASK 03-11-11 Hah

Output:

a                 b          new
11-010 Bla        11-010     11-010   
TASK 21 MMM                  21       
TASK 03-11-11 Hah            03-11-11

I tried to get the task number using below but I am unable to add space to it. The task number is always followed by space.

gsub("^[^_]*TASK|\.[^.]*\s$", "", df$a)
sub(".*?TASK=(.*?)' '.*", "\1", df$a)

Cath · Accepted Answer

You can capture, in case b is an empty string everything that is between "TASK " and the space with the following regex:

sub(".*TASK ([^ ]+) .+", "\1", df$a[df$b==""])
# [1] "21"       "03-11-11"

\1 permits to capture what is in between brackets in the regex, which, in this case, is [^ ]+: anything but a space, one or more times.

You can put that directly in df with:

df$new[df$b==""] <- sub(".*TASK ([^ ]+) .+", "\1", df$a[df$b==""])
#                  a      b      new
#1        11-010 Bla 11-010   11-010
#2       TASK 21 MMM              21
#3 TASK 03-11-11 Hah        03-11-11

Text extract and adding to new column

Answers (2)

Related Questions