Reputation: 827
after reading through all of the twitter streaming API and Phirehose PHP documentation i've come across something I have yet to do, collect and process data separately.
The logic behind it, If I understand correctly, is to prevent a log jam at the processing phase that will back up the collecting process. I've seen examples before but they basically write right to a MySQL database right after collection which seems to go against what twitter recommends you do.
What I'd like some advice/help on is, what is the best way to handle this and how. It seems that people recommend writing all the data directly to a text file then parsing/processing it with a separate function. But with this method, I'd assume it could be a memory hog.
Here's the catch, it's all going to be running as a daemon/background process. So does anyone have any experience with solving a problem like this, or more specifically, the twitter phirehose library? Thanks!
Some notes: *The connection will be through a socket so my guess is that the file will constantly be appended? not sure if anyone has any feedback on that
Upvotes: 0
Views: 2351
Reputation: 211
The phirehose library comes with an example of how to do this. See:
This uses a flat file, which is very scalable and fast, ie: Your average hard disk can write sequentially at 40MB/s+ and scales linearly (ie: unlike a database, it doesn't slow down as it gets bigger).
You don't need any database functionality to consume a stream (ie: you just want the next tweet, there's no "querying" involved).
If you rotate the file fairly often, you will get near-realtime performance (if desired).
Upvotes: 1