Extract specific value from a string variable in r

Question

I have a character variable. I need to extract the information from the title="" value. Basically, I need all the values inside of "" right after the title=.

Here is the example dataset:

df <- data.frame(
  id = c(1,2,3),
  character = c('mrow><mn>2<mn><mi>h<mi><m title="h+r=2"><mstyle',
        'mrow><mn>2<mn><mi>h<mi><m title="r+2h=h"><mstyle&',
        'mrow><mn>2<mn><mi>h<mi><m title="h∙rleft(frac{2h}{2}right)"><mstyle>'))

> df
  id                                                                                                  character
1  1                        mrow><mn>2<mn><mi>h<mi><m title="h+r=2"><mstyle
2  2                      mrow><mn>2<mn><mi>h<mi><m title="r+2h=h"><mstyle&
3  3 mrow><mn>2<mn><mi>h<mi><m title="h·rleft(frac{2h}{2}right)"><mstyle>

My desired output would be:

> df
  id                 character
1  1                     h+r=2
2  2                    r+2h=h
3  3 h·rleft(frac{2h}{2}right)

ekoam · Accepted Answer

Try this

library(dplyr)
df %>% mutate(character = sub(".+title=\"(.+)\".+", "\1", character))

Extract specific value from a string variable in r

Answers (2)

Related Questions