Martin M.
Martin M.

Reputation: 1

How to extract characters after a match from a string in r?

I have a variable in a data frame that contains raw json text. Some observations have a set 14 digit number that I want to extract and some don't. If the observation has the information it is under this format:

{"blur": "10010010010010"

I want to extract the 14 digits after {"blur": " if there is a match for this left-hand side part of the string. I tried str_extract but my regex syntax is not the best, any suggestions here?

Upvotes: 0

Views: 971

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 269481

If it's fully formed JSON you could use a JSON parser but assuming

  • it's just fragments as shown in the question or it is fully formed and you prefer to use regular expressions anyways
  • each input has 0 or 1 occurrences of the digit string
  • if 0 occurrences then use NA

then try this.

The second argument to strapply is the regular expression. It returns the portion matched to the capture group, i.e. the part of the regular expression within parentheses. The empty=NA argument tells it what to return if no occurrences are found.

library(gsubfn)
s <- c('{"blur": "10010010010010"', 'abc') # test input

strapply(s, '{"blur": "(\\d+)"', empty = NA, simplify = TRUE)
## [1] "10010010010010" NA 

Upvotes: 1

Related Questions