Todd
Todd

Reputation: 746

MultiProcess Perl program Timing out connection to MongoDB

I'm writing a migration program to transform the data in one database collection to another database collection using Perl and MongoDB. Millions of documents need to be transformed and performance is very bad (it will take weeks to complete, which is not acceptable). So I thought to use Parallel::TaskManager to create multiple processes to do the transformation in parallel. Performance starts OK, then rapidly tails off, and then I start getting the following errors:

update error: MongoDB::NetworkTimeout: Timed out while waiting for socket to become ready for reading
 at /usr/local/share/perl/5.18.2/Meerkat/Collection.pm line 322.
update error: MongoDB::NetworkTimeout: Timed out while waiting for socket to become ready for reading
 at /usr/local/share/perl/5.18.2/Meerkat/Collection.pm line 322.

So my suspicion is that this is due to spawned processes not letting go of sockets quick enough. I'm not sure how to fix this, though if in fact this is the problem.

What I've tried:

  1. I reduced tcp_keepalive_time via sudo sysctl -w net.ipv4.tcp_keepalive_time=120 and restarted my mongod
  2. I reduced the max_time_ms (this made matters worse)

Here's details on my setup

Not sure how to read this but here is a segment of mongostat from the time the errors were occurring:

insert query update delete getmore command % dirty % used flushes vsize   res qr|qw ar|aw netIn netOut conn     time
    *0    *0     *0     *0       0     1|0     0.0    0.3       0 20.4G  9.4G   0|0  1|35   79b    15k   39 11:10:37
    *0     3      8     *0       0    11|0     0.0    0.3       0 20.4G  9.4G   0|0  2|35    5k    18k   39 11:10:38
    *0     3      1     *0       1     5|0     0.1    0.3       0 20.4G  9.4G   0|0  1|35    2k    15m   39 11:10:39
    *0    12      4     *0       1    13|0     0.1    0.3       0 20.4G  9.4G   0|0  2|35    9k   577k   43 11:10:40
    *0     3      1     *0       3     5|0     0.1    0.3       0 20.4G  9.4G   0|0  1|34    2k    10m   43 11:10:41
    *0     3      8     *0       1    10|0     0.1    0.3       0 20.4G  9.4G   0|0  2|34    5k     2m   43 11:10:42
    *0     9     24     *0       0    29|0     0.1    0.3       0 20.4G  9.4G   0|0  5|34   13k    24k   43 11:10:43
    *0     3      8     *0       0    10|0     0.1    0.3       0 20.4G  9.4G   0|0  5|35    4k    12m   43 11:10:44
    *0     3      8     *0       0    11|0     0.1    0.3       0 20.4G  9.4G   0|0  5|35    5k    12m   42 11:10:45
    *0    *0     *0     *0       0     2|0     0.1    0.3       0 20.4G  9.3G   0|0  4|35  211b    12m   42 11:10:46

Please let me know if you would like to see any additional information to help me diagnose this problem.

Dropping the number of processes running in parallel down to 3 from 8 (or more) seems to cut down the number of timeout errors, but at the cost of throughput.

Upvotes: 1

Views: 482

Answers (1)

Todd
Todd

Reputation: 746

None of the tuning suggestion helped, nor did bulk inserts.

I continued to investigate and the root of the problem was that that my process was doing many "$addToSet" operations, which can become slow with large arrays. So my I was consuming all available sockets with slow updates. I restructured my documents so that I would not use arrays that could become large and I returned to an acceptable insert rate.

Upvotes: 1

Related Questions