Reputation: 87
I'm running a haskell websocket server using Wai:
application :: MVar ServerState -> Wai.Application
application state = WaiWS.websocketsOr WS.defaultConnectionOptions wsApp staticApp
where
wsApp :: WS.ServerApp
wsApp pendingConn = do
conn <- WS.acceptRequest pendingConn
talk conn state
To allow a single client to send asynchronous messages, talk is defined as follows:
talk :: WS.Connection -> MVar ServerState -> IO ()
talk conn state = forever $ do
msg <- WS.receiveMessage conn
putStrLn "received message"
successLock <- newEmptyMVar
tid <- timeoutAsync successLock $ processMessage c state msg
putStrLn "forked thread"
modifyMVar_ state $ \curState ->
return $ curState & threads %~ (M.insert mid tid) -- thread bookkeeping
putStrLn "modified state"
putMVar successLock ()
putStrLn "unlocked success"
where
mid = serverMessageId msg
timeoutAsync lock f = forkIO $ do
timeout S.process_message_timeout onTimeout (onSuccess lock) f
onSuccess lock = do
-- block until the first modifyMVar_ above finishes.
takeMVar lock
modifyMVar_ state $ \curState ->
return $ curState & threads %~ (M.delete mid) -- thread cleanup
onTimeout = ...
Here's the thing: when I bombard this server with many messages (from a single client) that are CPU-heavy, the the main thread occasionally hangs at "forked thread".
This is surprising because all work on messages are (in theory) being done in separate threads, and so the main thread (forever
) should never block.
What's going on here?
[EDIT]
A minimum verifiable example is pretty hard to provide in this case (the work is done in processMessage
, but comprises a lot of moving parts, any of which might be the problem). Instead, I'm looking for high-level pointers to things I could investigate.
Here is data from an example run (send the server an expensive request, then a bunch of smaller less-expensive ones):
gc productivity 36%: http://puu.sh/nSxnj/d8bb5995ae.png
event log (using +RTS -ls
and -eventlog
): http://puu.sh/nSxDy/efe457bee2.eventlog
CPU usage ~300% (for 4 caps) -- made me think GC might be competing with OS resources; I decreased the num capabilities to n-1
, and this seemed to improve responsiveness
Also, the app has the following properties, which I think are potential causes of the problem:
ratio of GC'd to live data is high; processMessage
basically constructs a giant list which is aeson'd
and sent back to the user, but not kept in state
many foreign calls are made (due to ZMQ, which iirc makes unsafe foreign calls) on a single request
ThreadScope tells me that lots of heapoverflows occur, causing GC requests
Upvotes: 4
Views: 207