Reputation: 23
I'm trying to extract values in a form from a word document so that I can tabulate them. I used the antiword package to convert the .doc into a character string, now I'd like to pull out values based on markers within the document.
For example
example<- 'CONTACT INFORMATION\r\n\r\nName: John Smith\r\n\r\nphone: XXX-XXX-XXXX\r\n\r\n'
Name<- grep('\nName:', example, value = TRUE)
Name
This code returns the whole string when I'd like it to just return 'John Smith'.
Is there a way to add an end marker to the grep()
? I've also tried str_extract()
but I'm having trouble formatting my pattern to regex
Upvotes: 2
Views: 64
Reputation: 13319
We can also use:
strsplit(stringr::str_extract_all(example,"\\\nName:.*",simplify = T),": ")[[1]][2]
#[1] "John Smith"
Upvotes: 1
Reputation: 887078
We can use gsub
to remove the substring that include Name:
and after those characters that start after the \r
by matching the pattern and replace with blank (""
)
gsub(".*Name:\\s+|\r.*", "", example)
#[1] "John Smith"
Upvotes: 3