Reputation: 5151
I would like to hear about from the community a nice pattern to the following problem.
I had a "do-everything" server, which were webserver, mysql, crawlers server. Since two or three weeks, using monitoring tools, i saw that always when my crawlers were running, my load average was going over 5 (a 4 core server, would be ok to have until 4.00 as load). So, i've got another server and i want to move my crawlers to there. My question is. As soon as i have the data crawled in my crawler server, i have to insert in my database. And i wouldn't like to open a remote connection and insert it in the database, since i prefer to use the Rails framework, btw i'm using rails, to keep easier to create all relationships, and etc.
problem to be solved:
server, has the crawled data (bunch of csv files) and i want to move it to a remote server and insert it in my db using rails.
restriction: I don't want to run mysql (slave + master) since it would require a deeper analysis to know where happens more write operations.
Ideas:
move the csvs from crawlers to remove server using (ssh, rsync) and importing it during the day
write an API in the crawler server that my remote server can pull (many times at day) and import the data
any other idea or good patterns around this theme?
Upvotes: 0
Views: 119
Reputation: 3866
With a slight variation to the second pattern you have noted you could have a API in your web-app-server/db-server. Which the crawler will use to report in his data. He could do this in batches, real-time or only in a specific window of time (day/night time...etc).
This pattern will let the crawler decide when to report in the data. rather than having the web-app do the 'polling' for data.
Upvotes: 1