Michael Lopez
Michael Lopez

Reputation: 79

Extract part of string in R dataframe column

I am trying to extact an ID thats a part of a string within a column in R. I would like to write an expression that would extract the art starting with IAB and ending in a number. how would I do this?

sample strings:

[31] "{\"\"element\"\":\"\"IAB1_4\"\"}"  
[32] "{\"\"element\"\":\"\"IAB19_3\"\"}" 
[33] "{\"\"element\"\":\"\"IAB19_16\"\"}"
[34] "{\"\"element\"\":\"\"IAB9_11\"\"}" 
[35] "{\"\"element\"\":\"\"IAB19_5\"\"}" 
[36] "{\"\"element\"\":\"\"IAB18_1\"\"}"

I need to extract just the part that starts with IAB and end in a number. How could I do this?

Upvotes: 1

Views: 7682

Answers (1)

akrun
akrun

Reputation: 887971

We can use str_extract to match one or more digits (\\d+) after the string 'IAB' followed by an underscore (_) and one or more digits (\\d+)

library(stringr)
str_extract(v1, 'IAB\\d+_\\d+')
#[1] "IAB1_4"   "IAB19_3"  "IAB19_16" "IAB9_11"  "IAB19_5"  "IAB18_1" 

Or with regexpr from base R

regmatches(v1, regexpr('IAB\\d+_\\d+', v1))
#[1] "IAB1_4"   "IAB19_3"  "IAB19_16" "IAB9_11"  "IAB19_5"  "IAB18_1" 

data

v1 <- c("{\"\"element\"\":\"\"IAB1_4\"\"}", "{\"\"element\"\":\"\"IAB19_3\"\"}", 
"{\"\"element\"\":\"\"IAB19_16\"\"}", "{\"\"element\"\":\"\"IAB9_11\"\"}", 
"{\"\"element\"\":\"\"IAB19_5\"\"}", "{\"\"element\"\":\"\"IAB18_1\"\"}"
)

Upvotes: 3

Related Questions