user459
user459

Reputation: 111

Ectracting some content between two words in R

If this is the test string -

alt="mass |36 grams\nserving volume | 63 mL (milliliters)\nserving density | 0.57 g\/cm^3 (grams per cubic centimeter)" title="mass | 36 grams.

\btitle="mass| \b.*+\s*+\K.*(?=serving volume\b) 

This is my code but it does not return what is required. Then how to extract 36 grams from this text?

It would be great if someone could share a link from where I can learn regex.

Upvotes: 0

Views: 60

Answers (3)

Pierre L
Pierre L

Reputation: 28461

gsub('mass \\|([0-9]* [A-Za-z]*).*', '\\1', alt)
[1] "36 grams"

To exclude the unit:

gsub('mass \\|([0-9]*).*', '\\1', alt)
[1] "36"

Careful with the extra space, it will be captured too. This is not what you want:

gsub('mass \\|([0-9]* ).*', '\\1', alt)
[1] "36 "

Upvotes: 2

Mindastic
Mindastic

Reputation: 4131

Did you try with:

/mass \|([a-zA-Z-0-9\s]+)\sserving volume/

Upvotes: 1

Andrelrms
Andrelrms

Reputation: 819

For the example you gave this will work, but depending on what you want to do you might need something more general:

alt<-"mass |36 grams\nserving volume | 63 mL (milliliters)\nserving density | 0.57 g/cm^3 (grams per cubic centimeter)"
gsub(".*\\|([0-9]+ gram).*","\\1",alt)
[1] "36 gram"

Upvotes: 1

Related Questions