amisos55
amisos55

Reputation: 1979

Extract specific value from a string variable in r

I have a character variable. I need to extract the information from the title="" value. Basically, I need all the values inside of "" right after the title=.

Here is the example dataset:

df <- data.frame(
  id = c(1,2,3),
  character = c('mrow&gt;&lt;mn&gt;2&lt;mn&gt;&lt;mi&gt;h&lt;mi&gt;&lt;m title="h+r=2"&gt;&lt;mstyle',
        'mrow&gt;&lt;mn&gt;2&lt;mn&gt;&lt;mi&gt;h&lt;mi&gt;&lt;m title="r+2h=h"&gt;&lt;mstyle&',
        'mrow&gt;&lt;mn&gt;2&lt;mn&gt;&lt;mi&gt;h&lt;mi&gt;&lt;m title="h∙rleft(frac{2h}{2}right)"&gt;&lt;mstyle&gt'))

> df
  id                                                                                                  character
1  1                        mrow&gt;&lt;mn&gt;2&lt;mn&gt;&lt;mi&gt;h&lt;mi&gt;&lt;m title="h+r=2"&gt;&lt;mstyle
2  2                      mrow&gt;&lt;mn&gt;2&lt;mn&gt;&lt;mi&gt;h&lt;mi&gt;&lt;m title="r+2h=h"&gt;&lt;mstyle&
3  3 mrow&gt;&lt;mn&gt;2&lt;mn&gt;&lt;mi&gt;h&lt;mi&gt;&lt;m title="h·rleft(frac{2h}{2}right)"&gt;&lt;mstyle&gt

My desired output would be:

> df
  id                 character
1  1                     h+r=2
2  2                    r+2h=h
3  3 h·rleft(frac{2h}{2}right)

Upvotes: 1

Views: 837

Answers (2)

c0bra
c0bra

Reputation: 1080

You should use regex101 to create a fitting regular expression:

https://regex101.com/r/OFJhnQ/1

Then you can use str_extract to obtain the value.

Or you use the extract function from tidyr:

df %>% tidyr::extract(character, "title", regex="title=\"(.+)\"")

Upvotes: 1

ekoam
ekoam

Reputation: 8844

Try this

library(dplyr)
df %>% mutate(character = sub(".+title=\"(.+)\".+", "\\1", character))

Upvotes: 1

Related Questions