gsub extracting a specific number from a string regex optional comma

Question

I need to extract a specific number from strings in a vector that look like this:

V1    V2    info
XX    YY    AB=414312;CD=0.5555;EF=1234;GH=2346;IJ=551;AA_CD=0.4633
VV    ZZ    AB=1093;CD=0.4444,0.78463;EF=1654;GH=6546;IJ=1241;AA_CD=0.4366

I only want to extract the number from "CD=XXX" (notice there is also a "AA_CD=XXXX" in every row)

I currently have:

df$info <- as.numeric(gsub("^.*;CD=[0-9, ],?|;.*$", "", df$info))

Which grabs the number after "CD=" in instances where there is not more than one number separated by a comma.

I need this to include the rows in which there are more than one number separated by commas. My regex only works for rows in which there is only one number in that spot, like so:

0.5555
0.4444,0.78463
0.0123
0.34,0.54,0.765

I know it is probably a silly mistake I am making...Thanks in advance!!!

missuse · Accepted Answer

Here is an approach

lapply(strsplit(gsub("^.*;CD=(0\.[0-9]),?|;.*$", "\1", vec), ","), as.numeric)

gsub("^.*;CD=(0\.[0-9]),?|;.*$", "\1", vec) #extracts the numbers
#output
1] "0.5555"         "0.4444,0.78463"

these are then split at , with strsplit producing a list

then as.numeric converts the list elements with lapply

if it is not needed to keep track of which vector member had which numbers:

as.numeric(unlist(strsplit(gsub("^.*;CD=(0\.[0-9]),?|;.*$", "\1", vec), ",")))

gsub extracting a specific number from a string regex optional comma

Answers (1)

Related Questions