user1968094
user1968094

Reputation: 41

PostgreSQL blocking on too many inserts

I am working on a research platform that reads relevant Twitter feeds via the Twitter API and stores them in a PostgreSQL database for future analysis. Middleware is Perl, and the server is an HP ML310 with 8GB RAM running Debian linux.

The problem is that the twitter feed can be quite large (many entries per second), and I can't afford to wait for the insert before returning to wait for the next tweet. So what I've done is to use a fork() so each tweet gets a new process to insert into the database and the listener and return quickly to grab the next tweet. However, because each of these processes effectively opens a new connection to the PostgreSQL backend, the system never catches up with its twitter feed.

I am open to using a connection pooling suggestion and/or to upgrading hardware if necessary to make this work, but would appreciate any advice. Is this likely RAM bound, or is there configuration or software approaches I can try to make the system sufficiently speedy?

Upvotes: 2

Views: 861

Answers (2)

alex
alex

Reputation: 1304

Use COPY command. One script reads Tweeter and appends strings to the CSV file on disk. Other scripts looking for CSV file on disk, renamed this file file and started COPY command from this file.

Upvotes: 0

user330315
user330315

Reputation:

If you open and close a new connection for each insert, that is going to hurt big time. You should use a connection pooler instead. Creating a new database connection is not a lightweight thing to do.

Doing a a fork() for each insert is probably not such a good idea either. Can't you create one process that simply takes care of the inserts and listens on a socket, or scans a directory or something like that and another process signalling the insert process (a classical producer/consumer pattern). Or use some kind of message queue (I don't know Perl, so I can't say what kind of tools are available there).

When doing bulk inserts do them in a single transaction, sending the commit at the end. Do not commit each insert. Another option is to write the rows into a text file and then use COPY to insert them into the database (it doesn't get faster than that).

You can also tune the PostgreSQL server a bit. If you can afford to lose some transactions in case of a system crash, you might want to turn synchronous_commit off.

If you can rebuild the table from scratch anytime (e.g. by re-inserting the tweets), you might also want to make that table an "unlogged" table. It is faster than a regular table in writing, but if Postgres is not shown down cleanly, you lose all the data in the table.

Upvotes: 6

Related Questions