TheBigDoubleA
TheBigDoubleA

Reputation: 452

How to parse RSS non-XML feed in PHP?

I've been parsing tons of RSS feeds using PHP's simplexml_load_file and it works like a charm. Now I'm trying to do the same for the RSS feed of the Financial Times. When I do...

$rss = simplexml_load_file("http://www.ft.com/rss/world");

... I get:

Warning: simplexml_load_file(): http://www.ft.com/rss/world:11: parser error : Opening and ending tag mismatch: link line 8 and head in rss.php on line 6

Warning: simplexml_load_file(): oat:left;margin-right:20px;margin-top:3px;width:35px;height:31px;}</style></head in rss.php on line 6

Warning: simplexml_load_file(): ^ in rss.php on line 6

Warning: simplexml_load_file(): http://www.ft.com/rss/world:37: parser error : Opening and ending tag mismatch: input line 37 and li in rss.php on line 6

Warning: simplexml_load_file(): ^ in rss.php on line 6

and many, many more warnings (around 100).

I've searched Stackoverflow for answers, but I can't find anything that seems to apply to this case. What am I missing here?

Upvotes: 1

Views: 258

Answers (2)

hakre
hakre

Reputation: 197554

For some websites to work, you need to have a user-agent set with the HTTP request. As the default in PHP might be empty (which seems a sane setting privacy wise), you need to set it for the request:

ini_set('user_agent', "Godzilla/42.4 (Gabba Gandalf Client 7.3; C128; Z80) Lord of the RSS Weed Edition (KHTML, like Gold Dust Day Gecko) Chrome/97.0.43043.0 Safari/1337.42");

$rss = simplexml_load_file("http://www.ft.com/rss/world");

Upvotes: 1

Othi
Othi

Reputation: 356

Your code works for me here. Try omitting LIBXML_NOWARNING & LIBXML_NOERROR (which suppress any errors you might be getting) to see where it went wrong.

Upvotes: 0

Related Questions