Reputation: 48
I have a custom python plugin that I am using to pull data into Telegraf. It prints out line protocol output, as expected.
In my Ubuntu 18.04 environment, when this plugin is run I see a single line in my logs:
2020-12-28T21:55:00Z E! [inputs.exec] Error in plugin: exec: exit status 1 for command '/my_company/plugins-enabled/plugin-mysystem/poll_mysystem.py': Traceback (most recent call last):...
That is it. I can't figure out how to get the actual traceback.
If I run sudo -u telegraf /usr/bin/telegraf -config /etc/telegraf/telegraf.conf
, the plugin works as expected. It polls and loads data exactly as it should.
I'm not sure how to move forward with troubleshooting this error when telegraf is executing the plugin on it's own.
I have restarted the telegraf service. I have verified permissions (and I think that the execution above shows that it should work).
A few additional details based on the comments and answers received:
telegraf:telegraf
. The error does not seem to indicate that it can't see the file that is being executed, but rather something within the file is failing when Telegraf executes the plugin.Plugin code (/my_company/plugins-enabled/plugin-mysystem/poll_mysystem.py
):
from google.auth.transport.requests import Request
from google.oauth2 import id_token
import requests
import os
RUNTIME_URL = INTERNAL_URL
MEASUREMENT = "MY_MEASUREMENT"
CREDENTIALS = "GOOGLE_SERVICE_FILE.json"
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = CREDENTIALS # ENV VAR REQUIRED BY GOOGLE CODE BELOW
CLIENT_ID = VALUE_FROM_GOOGLE
exclude_fields = ["name", "version"] # Don't try to put these into influxdb from json response
def make_iap_request(url, client_id, method="GET", **kwargs):
# Code provided by Google docs
# Set the default timeout, if missing
if "timeout" not in kwargs:
kwargs["timeout"] = 90
# Obtain an OpenID Connect (OIDC) token from metadata server or using service
# account.
open_id_connect_token = id_token.fetch_id_token(Request(), client_id)
# Fetch the Identity-Aware Proxy-protected URL, including an
# Authorization header containing "Bearer " followed by a
# Google-issued OpenID Connect token for the service account.
resp = requests.request(method, url, headers={"Authorization": "Bearer {}".format(open_id_connect_token)}, **kwargs)
if resp.status_code == 403:
raise Exception("Service account does not have permission to " "access the IAP-protected application.")
elif resp.status_code != 200:
raise Exception(
"Bad response from application: {!r} / {!r} / {!r}".format(resp.status_code, resp.headers, resp.text)
)
else:
return resp.json()
def print_results(results):
"""
Take the results of a Dolores call and print influx line protocol results
"""
for item in results["workflow"]:
line_protocol_line_base = f"{MEASUREMENT},name={item['name']}"
values = ""
for key, value in item.items():
if key not in exclude_fields:
values = values + f",{key}={value}"
values = values[1:]
line_protocol_line = f"{line_protocol_line_base} {values}"
print(line_protocol_line)
def main():
current_runtime = make_iap_request(URL, CLIENT_ID, timeout=30)
print_results(current_runtime)
if __name__== "__main__":
main()
Relevant portion of the telegraf.conf
file:
[[inputs.exec]]
## Commands array
commands = [
"/my_company/plugins-enabled/plugin-*/poll_*.py",
]
Agent section of config file
[agent]
interval = "60s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
debug = false
quiet = false
logfile = "/var/log/telegraf/telegraf.log"
hostname = ""
omit_hostname = true
What do I do next?
Upvotes: 4
Views: 8267
Reputation: 577
The exec
plugin is truncating your Exception message at the newline. If you wrap your call to make_iap_request
in a try/except block, and then print(e, file=sys.stderr)
rather than letting the Exception bubble all the way up, that should tell you more.
def main():
"""
Query URL and print line protocol
"""
try:
current_runtime = make_iap_request(URL, CLIENT_ID, timeout=30)
print_results(current_runtime)
except Exception as e:
print(e, file=sys.stderr)
Alternately your script could log error messages to it's own log file, rather than passing them back to Telegraf. This would give you more control over what's logged.
I suspect you're running into an environment issue, where there's something different about how you're running it. If not permissions, it could be environment variable differences.
Upvotes: 1
Reputation: 385
Please do check the permissions.
It seems like it's a permission error. Since telegraf has the necessary permissions running sudo -u telegraf
works. But the user
you're trying from doesn't have the necessary permissions for accessing the files in /my_company/plugins-enabled/
.
So I will recommend looking into them and changing the permissions to Other can access and write
or to the username you are trying to use telegraf from.
In order to fix this run the command to go to the directory:
cd /my_company/plugins-enabled/
Then to change ownership to you and only you:
sudo chown -R $(whoami)
Then to change the read/write permissions to all files and folders otherwise:
sudo chmod -R u+w
And if you want everyone, literally everyone on the system to have access to read/write to those files and folders and just want to give all permissions to everyone:
sudo chmod -R 777
Upvotes: 0