Reputation: 348
My Issue:
I am trying to grab Facebook meta value from different sites, but some website(usatoday.com) are not having appropriate HTML code. As you can see the data sample 1 & 2, so my question is how can I modify my regex expression code to get the value of the property and content.
What I've done:
With below if statement, I am kind of resolving the quotation mark issue (not dynamic enough), but I guess there must be a better way (I am really suck in regex)
Secondly, the regex I had not able to catch the content value(the url) in Data Sample 2 for usatoday.com, I guess the "" in the url mess up my regex.
Really need some help here, big thanks!
if(
preg_match( '/<meta(.*?)property="og:title"(.*?)content="(.+?)"(.*?)(\/)?>/', $raw_html, $matching )
// for normal sites
or
preg_match( '/<meta(.*?)property=og:title(.*?)content="(.+?)"(.*?)(\/)?>/', $raw_html, $matching )
// property no quote at all
or
preg_match( '/<meta(.*?)property=og:title(.*?)content=(.+?)(.*?)(\/)?>/', $raw_html, $matching )
// no quote at all
)
Data Sample 1 - no quotation mark on meta text attribute
# usatoday.com
<meta property=og:title content="Lakers trading Russell Westbrook in massive three-team deal with Jazz and Timberwolves"/>
# normal sites
<meta property="og:title" content="Lakers trading Russell Westbrook in massive three-team deal with Jazz and Timberwolves"/>
Data Sample 2 - no quotation mark on meta URL attribute
# usatoday.com
<meta property=og:url content=https://www.usatoday.com/story/sports/nba/2023/02/08/lakers-jazz-timberwolves-trade-russell-westbrook-mike-conley-dangelo-russell/11214855002/ />
# normal sites
<meta property="og:url" content="https://www.usatoday.com/story/sports/nba/2023/02/08/lakers-jazz-timberwolves-trade-russell-westbrook-mike-conley-dangelo-russell/11214855002/" />
Upvotes: 0
Views: 35