Reputation: 195
I'm using the Application Insights Python API to publish a custom metric for my application every 30 s. This works fine for a while (up to several days), but then my Python script just hangs while trying to flush the data to Azure.
The Python code itself is fairly simple, and just this infinite loop:
while True:
count = get_connection_count()
if count is not None:
tc.track_metric("ConnectionCount", count, type=DataPointType.measurement, count=1)
tc.flush()
time.sleep(10)
A stack trace (below) shows the process is stuck on tc.flush()
, waiting from an answer from the server.
If I look at the TCP connections for the process, I can see the process still has an open TCP connection to Azure; it just not getting any reply. Has anyone encountered a similar issue? What would cause the Azure AppInsights to stop responding like this?
Alternatively, can a timeout be defined for the tc.flush
call, so I can at least recover from an unresponsive endpoint?
Here's the stack trace I was able to extract:
File "/var/lib/app-monitor/connectionMonitor.py", line 52, in <module>
tc.flush()
File "/usr/local/lib/python2.7/dist-packages/applicationinsights/TelemetryClient.py", line 55, in flush
self._channel.flush()
File "/usr/local/lib/python2.7/dist-packages/applicationinsights/channel/TelemetryChannel.py", line 71, in flush
self._queue.flush()
File "/usr/local/lib/python2.7/dist-packages/applicationinsights/channel/SynchronousQueue.py", line 39, in flush
local_sender.send(data)
File "/usr/local/lib/python2.7/dist-packages/applicationinsights/channel/SenderBase.py", line 118, in send
response = HTTPClient.urlopen(request)
File "/usr/lib/python2.7/urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1222, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1187, in do_open
r = h.getresponse(buffering=True)
File "/usr/lib/python2.7/httplib.py", line 1089, in getresponse
response.begin()
File "/usr/lib/python2.7/httplib.py", line 444, in begin
version, status, reason = self._read_status()
File "/usr/lib/python2.7/httplib.py", line 400, in _read_status
line = self.fp.readline(_MAXLINE + 1)
File "/usr/lib/python2.7/socket.py", line 476, in readline
data = self._sock.recv(self._rbufsize)
File "/usr/lib/python2.7/ssl.py", line 341, in recv
return self.read(buflen)
File "/usr/lib/python2.7/ssl.py", line 260, in read
return self._sslobj.read(len)
Upvotes: 0
Views: 1156
Reputation: 25126
After some discussion internally, there's a workaround, though not really a fix: make sure that sockets have some kind of default timeout value to prevent them from hanging forever:
import socket
socket.setdefaulttimeout(30)
note that this applies to any+all http calls from the script, so it isn't necessarily ideal, but does prevent things from hanging for a long long time.
Upvotes: 0
Reputation: 24138
Per my experience, there may be two reasons which will causing the issue.
Some limits on the number of metrics and events were exceeded in your application, please refer to the offical document and catch the responce status code via Wireshark or Fiddler on Linux to check it. There are some error codes for this case which include 402 (Payment required), 429 (Too many requests), 503 (Service unavailable), etc.
You can always get information for Application Insights on that health and status of the service at http://aka.ms/aistatus to check whether the issue was caused by some operations for planned maintenance or issue resolving.
Hope it helps.
Upvotes: 0