chulian
chulian

Reputation: 4117

Twitter RSS with Yahoo Pipes is not working

When I put a twitter feed (https://api.twitter.com/1/statuses/user_timeline.rss?screen_name=chulian1819) into yahoo pipes, I get an error 400, and when I use the YQL console it says "Redirected to a robots.txt restricted URL: https://api.twitter.com/1/statuses/user_timeline.rss?screen_name=chulian1819"

http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22https%3A%2F%2Fapi.twitter.com%2F1%2Fstatuses%2Fuser_timeline.rss%3Fscreen_name%3Dchulian1819%22&diagnostics=true

how to get the twitter feed of a user into yahoo pipes?

Thanks!

ps: my twitter post are not protected, i can se the rss feed on my browser and not logged into twitter

Upvotes: 0

Views: 1438

Answers (2)

Pieter
Pieter

Reputation: 31

Hi there i was able to make a twitter feed mix using yahoo! pipes I tried alot of different other "programs" but Yahoo! pipes just rules this one ;)

I used Fetch Feed, Sort and Regex to do my thing.

Folowing details are maybe interesting for other people

the url you can fetch from

http://api.twitter.com/1/statuses/user_timeline.rss?screen_name=REPLACEWITHNAME

http://api.twitter.com/1/statuses/user_timeline.rss?screen_name=REPLACEWITHOTHERNAME ...

sort by item.pubDate to get a mix of feeds by date

and i use regex to remove url's in the text (https?://([-\w.]+)+(:\d+)?(/([\w/_.]*(\?\S+)?)?)?)

probably there are pre-made yahoo pipes that are public and that you can simply clone and adapt, but i haven't looked into that so maybe someone else can post about that

anyway hope it helps

Upvotes: 3

Skizz
Skizz

Reputation: 646

When Yahoo Pipes retrieves content from either an RSS feed or even a web page it identifies itself using the User Agent String in the request header, this is fixed by Yahoo and cannot be changed. So if the site being scraped has blocked yahoo pipes then you are out of luck and it cannot be done.

The only workaround is to change over to using cURL, this can mimic a web browsers userAgentstring and bypass the robots.txt file. However this will mean using a PHP enabled webserver or a google app engine to grab the feed.

Upvotes: 2

Related Questions