what
what

Reputation: 348

RegEx: Grabbing values with or without quotation marks

My Issue:
I am trying to grab Facebook meta value from different sites, but some website(usatoday.com) are not having appropriate HTML code. As you can see the data sample 1 & 2, so my question is how can I modify my regex expression code to get the value of the property and content.

What I've done:
With below if statement, I am kind of resolving the quotation mark issue (not dynamic enough), but I guess there must be a better way (I am really suck in regex)

Secondly, the regex I had not able to catch the content value(the url) in Data Sample 2 for usatoday.com, I guess the "" in the url mess up my regex.

Really need some help here, big thanks!

if( 

preg_match( '/<meta(.*?)property="og:title"(.*?)content="(.+?)"(.*?)(\/)?>/', $raw_html, $matching )
// for normal sites

or

preg_match( '/<meta(.*?)property=og:title(.*?)content="(.+?)"(.*?)(\/)?>/', $raw_html, $matching )
// property no quote at all

or

preg_match( '/<meta(.*?)property=og:title(.*?)content=(.+?)(.*?)(\/)?>/', $raw_html, $matching ) 
// no quote at all

)

Data Sample 1 - no quotation mark on meta text attribute

# usatoday.com
<meta property=og:title content="Lakers trading Russell Westbrook in massive three-team deal with Jazz and Timberwolves"/>

# normal sites
<meta property="og:title" content="Lakers trading Russell Westbrook in massive three-team deal with Jazz and Timberwolves"/>

Data Sample 2 - no quotation mark on meta URL attribute

# usatoday.com
<meta property=og:url content=https://www.usatoday.com/story/sports/nba/2023/02/08/lakers-jazz-timberwolves-trade-russell-westbrook-mike-conley-dangelo-russell/11214855002/ />

# normal sites
<meta property="og:url" content="https://www.usatoday.com/story/sports/nba/2023/02/08/lakers-jazz-timberwolves-trade-russell-westbrook-mike-conley-dangelo-russell/11214855002/" />

Upvotes: 0

Views: 35

Answers (0)

Related Questions