Reputation: 21

No such container (using worker_harness_container_image)

I'm trying to run an Apache Beam Job on Google Cloud Dataflow (Job-ID: 2020-06-08_23_39_43-14062032727466654144) using the flags

--experiment=beam_fn_api 
--worker_harness_container_image=gcr.io/PROJECT_NAME/apachebeamp3.7_imageconversion:latest

Unfortunately, the job ist stuck in the starting state. The job with the exact same configuration ran in the beginning of this year (February?) and I'm wondering what has changed since and what changes are needed on my side to get it running again.

If I run the job locally with

--runner=PortableRunner \
--job_endpoint=embed \
--environment_config=PROJECT_NAME/apachebeamp3.7_imageconversion:latest

it runs perfectly.

In the Dataflow logs, i see the following error messages:

getPodContainerStatuses for pod "dataflow-beamapp-sam-0609063936-65-06082339-h464-harness-zzpb_default(a65b24a783afd25920bf29ff27d7baf8)" failed: rpc error: code = Unknown desc = Error: No such container: 586554fec1cf2942c7d2f45589db02b217c90c2ea96982041fc3f12b4b6595ff"

and

ContainerStatus "1647b951d266b4b1d318317b1836002eb4731a510dffa38ba6b58b45a7710784" from runtime service failed: rpc error: code = Unknown desc = Error: No such container: 1647b951d266b4b1d318317b1836002eb4731a510dffa38ba6b58b45a7710784

I'm a bit puzzled regarding the container ID since gcr.io/PROJECT_NAME/apachebeamp3.7_imageconversion:latest has currently 8bdf43f9cdcd20d4c258a7810c81cb5214ecc984e534117ef8ba1a4cab2a3dae.

Questions:

Why do I get error messages related to containers that seem not to be referenced by me?
What do I need to do in order to get my job running again?

Edit Additional information based on question below:

Thanks for the pointers. I have looked at the dataflow.googleapis.com/kubelet logs. The only errors I see there are

while getting AWS credentials NoCredentialProviders: no valid providers in chain. Deprecated.
ContainerStatus "55271a8a1af2a90d6162eda03bd8924aad502fd32f09ca50bf35af58e428cf59" from runtime service failed: rpc error: code = Unknown desc = Error: No such container: 55271a8a1af2a90d6162eda03bd8924aad502fd32f09ca50bf35af58e428cf59
Error syncing pod a65b24a783afd25920bf29ff27d7baf8 ("dataflow-beamapp-sam-0609063936-65-06082339-h464-harness-7056_default(a65b24a783afd25920bf29ff27d7baf8)"), skipping: [failed to "StartContainer" for "sdk0" with CrashLoopBackOff: "Back-off 10s restarting failed container=sdk0 pod=dataflow-beamapp-sam-0609063936-65-06082339-h464-harness-7056_default(a65b24a783afd25920bf29ff27d7baf8)".

Strangely, I do not see a category worker-startup in the log viewer. What do I need to do to see those log entries and to be able to make the next step on this debugging journey :-)?

Upvotes: 1

Answers (3)

Dyjah

Reputation: 101

For me the problem was fixed when I removed the option --experiments=use_runner_v2 when running the pipeline

Upvotes: 0

Rohith Uppala

Reputation: 46

I am having a similar issue, getting Container Status xxxxx service failed and Error Syncing pod

I am trying to read data from the file and process it for a streaming application. Once I removed options.setStreaming(true) it is working properly.

Streaming is for unbounded data like reading from PubSub, Kafka and batching is for bounded data reading from database or file.

Upvotes: 0

sam

Reputation: 21

Turns out I made multiple mistakes:

In my Dockerfile, I needed to change FROM apachebeam/python3.7_sdk:latest to FROM apache/beam_python3.7_sdk:latest. According to https://hub.docker.com/r/apachebeam/python3.7_sdk, there has been a switch from version 2.20.0 onwards.
My Dockerfile didn't use the correct version of the Python beam package.

Upvotes: 1

No such container (using worker_harness_container_image)

Answers (3)

Related Questions