Ryan Stack
Ryan Stack

Reputation: 1331

Pub/Sub to Splunk Dataflow Template - The requested URL was not found on this server

I am using the Dataflow template (i've tried both the latest and 2020-11-02-00_RC00 of Cloud_PubSub_to_Splunk ) that streams data from a pubsub topic to splunk. I have followed all steps from the Documentation.

My job arguments were:

JOB_NAME=pubsub-to-splunk-$USER-`date +"%Y%m%d-%H%M%S%z"`
gcloud dataflow jobs run $JOB_NAME \
    --subnetwork=https://www.googleapis.com/compute/v1/projects/<PROJECT>/regions/us-central1/subnetworks/<NAME> \
    --gcs-location gs://dataflow-templates/2020-11-02-00_RC00/Cloud_PubSub_to_Splunk \
    --max-workers 2 \
    --parameters=inputSubscription="projects/<PROJECT>/subscriptions/logs-export-subscription",token="<TOKEN>",url="https://<URL>:8088/services/collector/event",outputDeadletterTopic="projects/<PROJECT>/topics/splunk-pubsub-deadletter",batchCount="10",parallelism="8",disableCertificateValidation=true

I can successfully start the Dataflow job and streaming begins and I can see unacked message count from my logs-export-subscription going down, however the job fails when writing to Splunk with the following error:

Error writing to Splunk. StatusCode: 404, content: {"text":"The requested URL was not found on this server.","code":404}, StatusMessage: Not Found

When troubleshooting, I can successfully send a request to the Splunk endpoint from the same subnetwork that the Dataflow workers are running in.

curl -k https://<URL>:8088/services/collector/event -H "Authorization: Splunk <HEC TOKEN>" -d '{"event": {"field1": "hello", "field2": "world"}}'

{"text":"Success","code":0}

And so, I don't think it is a connection or url issue like the error message suggests.

I can reproduce the failure with curl when I remove -d key and value.

curl -k https://<IP>:8088/services/collector/event -H "Authorization: Splunk <TOKEN>" 

{"text":"The requested URL was not found on this server.","code":404}

Any idea what may be causing this issue?

Upvotes: 0

Views: 986

Answers (2)

Roy Arsan
Roy Arsan

Reputation: 21

Thanks for reporting this. We've updated the docs with example to clarify that parameter. Specifically, Splunk HEC url template parameter is as follows:

<protocol>://<host>:<port>

For example: https://splunk-hec.example.com:8088. Host is the FQDN (or IP) of either Splunk instance running HEC (in case of single HEC instance) or the HTTP(S) Load Balancer in front of HEC tier (in case of distributed HEC setup).

You do not specify the full HEC endpoint path. The Splunk Dataflow template currently only supports HEC JSON Object endpoint (i.e services/collector/event), and it appends it automatically in outgoing HTTP requests.

Also, for a deeper dive, be sure to check out these new resources:

Upvotes: 0

Ryan Stack
Ryan Stack

Reputation: 1331

The Splunk HEC URL that should be supplied should only be https://[IP]:8088, NOT the full path https://[IP]:8088/services/collector/event, as the path is appended by the Google library.

Upvotes: 1

Related Questions