Reputation: 21
I'm trying to run an Apache Beam Job on Google Cloud Dataflow (Job-ID: 2020-06-08_23_39_43-14062032727466654144
) using the flags
--experiment=beam_fn_api
--worker_harness_container_image=gcr.io/PROJECT_NAME/apachebeamp3.7_imageconversion:latest
Unfortunately, the job ist stuck in the starting state. The job with the exact same configuration ran in the beginning of this year (February?) and I'm wondering what has changed since and what changes are needed on my side to get it running again.
If I run the job locally with
--runner=PortableRunner \
--job_endpoint=embed \
--environment_config=PROJECT_NAME/apachebeamp3.7_imageconversion:latest
it runs perfectly.
In the Dataflow logs, i see the following error messages:
getPodContainerStatuses for pod "dataflow-beamapp-sam-0609063936-65-06082339-h464-harness-zzpb_default(a65b24a783afd25920bf29ff27d7baf8)" failed: rpc error: code = Unknown desc = Error: No such container: 586554fec1cf2942c7d2f45589db02b217c90c2ea96982041fc3f12b4b6595ff"
and
ContainerStatus "1647b951d266b4b1d318317b1836002eb4731a510dffa38ba6b58b45a7710784" from runtime service failed: rpc error: code = Unknown desc = Error: No such container: 1647b951d266b4b1d318317b1836002eb4731a510dffa38ba6b58b45a7710784
I'm a bit puzzled regarding the container ID since gcr.io/PROJECT_NAME/apachebeamp3.7_imageconversion:latest
has currently 8bdf43f9cdcd20d4c258a7810c81cb5214ecc984e534117ef8ba1a4cab2a3dae
.
Questions:
Edit Additional information based on question below:
Thanks for the pointers. I have looked at the dataflow.googleapis.com/kubelet
logs. The only errors I see there are
while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
ContainerStatus "55271a8a1af2a90d6162eda03bd8924aad502fd32f09ca50bf35af58e428cf59" from runtime service failed: rpc error: code = Unknown desc = Error: No such container: 55271a8a1af2a90d6162eda03bd8924aad502fd32f09ca50bf35af58e428cf59
Error syncing pod a65b24a783afd25920bf29ff27d7baf8 ("dataflow-beamapp-sam-0609063936-65-06082339-h464-harness-7056_default(a65b24a783afd25920bf29ff27d7baf8)"), skipping: [failed to "StartContainer" for "sdk0" with CrashLoopBackOff: "Back-off 10s restarting failed container=sdk0 pod=dataflow-beamapp-sam-0609063936-65-06082339-h464-harness-7056_default(a65b24a783afd25920bf29ff27d7baf8)"
. Strangely, I do not see a category worker-startup
in the log viewer. What do I need to do to see those log entries and to be able to make the next step on this debugging journey :-)?
Upvotes: 1
Views: 1905
Reputation: 101
For me the problem was fixed when I removed the option --experiments=use_runner_v2
when running the pipeline
Upvotes: 0
Reputation: 46
I am having a similar issue, getting Container Status xxxxx service failed
and Error Syncing pod
I am trying to read data from the file and process it for a streaming application. Once I removed options.setStreaming(true)
it is working properly.
Streaming is for unbounded data like reading from PubSub, Kafka and batching is for bounded data reading from database or file.
Upvotes: 0
Reputation: 21
Turns out I made multiple mistakes:
FROM apachebeam/python3.7_sdk:latest
to FROM apache/beam_python3.7_sdk:latest
. According to https://hub.docker.com/r/apachebeam/python3.7_sdk, there has been a switch from version 2.20.0 onwards.Upvotes: 1