Christopher Markieta
Christopher Markieta

Reputation: 5913

Twisted SSL socket connection slowdown

How do I scale my Twisted server to handle tens of thousands of concurrent SSL socket connections?

The first few hundred clients are connected relatively quickly, but as the count approaches 3000, it begins to crawl at about 2 connections made per second.

I am load testing using the loop below:

clients =  []

for i in xrange(connections):
    print i
    clients.append(
        ssl.wrap_socket(
            socket.socket(socket.AF_INET, socket.SOCK_STREAM),
            ca_certs="server.crt",
            cert_reqs=ssl.CERT_REQUIRED
        )
    )

    clients[i].connect(('localhost', 9999))

cProfile:

         296644049 function calls (296407530 primitive calls) in 3070.656 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001 3070.656 3070.656 server.py:7(<module>)
        1    0.000    0.000 3070.408 3070.408 server.py:148(main)
        1    0.000    0.000 3070.406 3070.406 server.py:106(run)
        1    0.000    0.000 3070.405 3070.405 base.py:1190(run)
        1    0.047    0.047 3070.404 3070.404 base.py:1195(mainLoop)
    34383    0.090    0.000 3070.263    0.089 epollreactor.py:367(doPoll)
    38696    0.064    0.000 3066.883    0.079 log.py:75(callWithLogger)
    38696    0.077    0.000 3066.797    0.079 log.py:70(callWithContext)
    38696    0.035    0.000 3066.598    0.079 context.py:117(callWithContext)
    38696    0.056    0.000 3066.556    0.079 context.py:61(callWithContext)
    38695    0.093    0.000 3066.486    0.079 posixbase.py:572(_doReadOrWrite)
     8599 1249.585    0.145 3019.333    0.351 protocol.py:114(getClientsDict)
 37582010 1681.445    0.000 1681.445    0.000 {method 'items' of 'dict' objects}
    21496    0.114    0.000 1535.798    0.071 tls.py:346(_flushReceiveBIO)
    21496    0.026    0.000 1535.793    0.071 tcp.py:199(doRead)
    21496    0.017    0.000 1535.718    0.071 tcp.py:218(_dataReceived)
    17197    0.033    0.000 1535.701    0.089 tls.py:400(dataReceived)
     8597    0.009    0.000 1531.480    0.178 policies.py:119(dataReceived)
     8597    0.078    0.000 1531.471    0.178 protocol.py:65(dataReceived)
     4300    0.029    0.000 1525.117    0.355 posixbase.py:242(_disconnectSelectable)
     4300    0.030    0.000 1524.922    0.355 tcp.py:283(connectionLost)
     4300    0.024    0.000 1524.659    0.355 tls.py:463(connectionLost)
     4300    0.010    0.000 1524.492    0.355 policies.py:123(connectionLost)
     4300    0.119    0.000 1524.471    0.355 protocol.py:50(connectionLost)
     4299    0.027    0.000 1523.698    0.354 tcp.py:270(readConnectionLost)
     4299    0.135    0.000 1520.228    0.354 protocol.py:88(handleInitialState)
 74840519   31.487    0.000   44.916    0.000 __init__.py:348(__getattr__)

Reactor run code:

def run(self):
    contextFactory = ssl.DefaultOpenSSLContextFactory(self._key, self._cert)
    reactor.listenSSL(self._port, BrakersFactory(), contextFactory)
    reactor.run()

Upvotes: 0

Views: 1036

Answers (2)

Christopher Markieta
Christopher Markieta

Reputation: 5913

I managed to determine the cause of slowdown in my protocol.

As you can see from the cProfile above, the majority of tottime was spent in the getClientDict() method:

         296644049 function calls (296407530 primitive calls) in 3070.656 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     8599 1249.585    0.145 3019.333    0.351 protocol.py:114(getClientsDict)
 37582010 1681.445    0.000 1681.445    0.000 {method 'items' of 'dict' objects}

The following code was causing this issue:

def getClientsDict(self):
    rc = {1: {}, 2: {}}

    for r in self.factory._clients[1]:
        rc[1] = dict(rc[1].items() +
                                  {r.getDict[1]['id']:
                                       r.getDict[1][
                                           'address']}.items())
    for m in self.factory._clients[2]:
        rc[2] = dict(rc[2].items() +
                                 {m.getDict[2]['id']:
                                      m.getDict[2][
                                          'address']}.items())
    return rc

Upvotes: 1

Mike Lutz
Mike Lutz

Reputation: 1832

Given the lack of code in the question, I've toss some together to see if I experience the effect your talking about. And from that experiment, the first thing I would say is check and see what is happening with memory utilization on your machine while your script runs.

I spun up a standard google cloud computing system (1 vCPU, 3.8GB ram) (debian backports wheezy, apt-get update; apt-get install python-twisted) and ran the following (awful hack) code:

(note: to run this I needed to do a ulimit -n 4096 for both the client and server shells or I would start getting 'Too many open file' I.E. Socket accept - "Too many open files")

serv.py

#!/usr/bin/python

from twisted.internet import ssl, reactor
from twisted.internet.protocol import ServerFactory, Protocol

class Echo(Protocol):
    def connectionMade(self):
        self.factory.clients.append(self)
        print "Currently %d open connections.\n" % len(self.factory.clients)

    def connectionLost(self, reason):
        self.factory.clients.remove(self)
        print "Lost connection"

    def dataReceived(self, data):
        """As soon as any data is received, write it back."""
        self.transport.write(data)

class MyServerFactory(ServerFactory):
    protocol = Echo

    def __init__(self):
        self.clients = []



if __name__ == '__main__':
    factory = MyServerFactory()
    reactor.listenSSL(8000, factory,
                      ssl.DefaultOpenSSLContextFactory(
            'keys/server.key', 'keys/server.crt'))
    reactor.run()

cli.py

#!/usr/bin/python

from twisted.internet import ssl, reactor
from twisted.internet.protocol import ClientFactory, Protocol

class EchoClient(Protocol):
    def connectionMade(self):
        print "hello, world"
        # The following delay is there because as soon as the write
        # happens the server will close the connection
        reactor.callLater(60, self.transport.write, "hello, world!")

    def dataReceived(self, data):
        print "Server said:", data
        self.transport.loseConnection()

class EchoClientFactory(ClientFactory):
    protocol = EchoClient

    def __init__(self):
        self.stopping = False

    def clientConnectionFailed(self, connector, reason):
        print "Connection failed - reason ",  reason
        if not self.stopping:
              self.stopping = True
              reactor.callLater(10,reactor.stop)

    def clientConnectionLost(self, connector, reason):
        print "Connection lost - goodbye!"
        if not self.stopping:
              self.stopping = True
              reactor.callLater(10,reactor.stop)

if __name__ == '__main__':
    connections = 4000
    factory = EchoClientFactory()
    for i in xrange(connections):
          # the following could certainly be done more elegantly, but I believe
          # its a legit use, and given the list in finite, shouldn't be too
          # resource intensive of a use... ?
          reactor.callLater(i/float(400), reactor.connectSSL,'xx.xx.xx.xx', 8000, factory, ssl.ClientContextFactory())
    reactor.run()

Upon running, and crossing 2544 connections, my machine seriously jammed up, sufficiently so it was hard to collect data from, but given that new ssh'es where coming back with '/bin/bash: Cannot allocate memory', and when I did get on my serv.py had 2g of res, and the client had 1.4g, I think it's safe to say that I blew ram.

Given the above code was just a fast hack, I likely have outstanding bugs that caused the memory problem - though I thought I would offer the idea, because causing your machine to swap is certainly a good way to cause your app to crawl. (and perhaps you have the same bugs as me)

(BTW for the smarter twisted people out there, I welcome comment I what I'm doing wrong thats burning so much ram)

Upvotes: 2

Related Questions