aviral sanjay
aviral sanjay

Reputation: 983

Pachyderm pipeline does not start a job and launches an empty repo

I have a JSON configuration for my pipeline in Pachyderm:

{
    "pipeline": {
        "name": "mopng-beneficiary-v2"
    },
    "input": {
        "pfs": {
            "repo": "mopng_beneficiary_v2",
            "glob": "/*"
        }
    },
    "transform": {
        "cmd": ["python3", "/pclean_phlc9h6grzqdhm6sc0zrxjne_UdOgg.py /pfs/mopng_beneficiary_v2/euoEQHIwIQTe1wXtg46fFYok.csv /pfs/mopng_beneficiary_v2//Users/aviralsrivastava/Downloads/5Feb18_master_ujjwala_latlong_dist_dno_so_v7.csv /pfs/mopng_beneficiary_v2//Users/aviralsrivastava/Downloads/ppac_master_v3_mmi_enriched_with_sanity_check.csv /pfs/mopng_beneficiary_v2/Qc.csv"],
        "image": "mopng-beneficiary-v2-image"
    }
}

And my docker file is as follows:

FROM ubuntu:14.04

# Install opencv and matplotlib.
RUN apt-get update \
    && apt-get upgrade -y \
    && apt-get install -y unzip wget build-essential \
        cmake git pkg-config libswscale-dev \
        python3-dev python3-numpy python3-tk \
        libtbb2 libtbb-dev libjpeg-dev \
        libpng-dev libtiff-dev libjasper-dev \
        bpython python3-pip libfreetype6-dev \
    && apt-get clean \
    && rm -rf /var/lib/apt

RUN sudo pip3 install matplotlib
RUN sudo pip3 install pandas

# Add our own code.
ADD pclean.py /pclean.py

However, when I run my command to create the pipeline:

pachctl create-pipeline -f https://raw.githubusercontent.com/avisrivastava254084/learning-pachyderm/master/pipeline.json

The files are existing in the pfs:

pachctl put-file mopng_beneficiary_v2 master -f /Users/aviralsrivastava/Downloads/pclean_phlc9h6grzqdhm6sc0zrxjne_UdOgg.py
➜  ~ pachctl put-file mopng_beneficiary_v2 master -f /Users/aviralsrivastava/Downloads/5Feb18_master_ujjwala_latlong_dist_dno_so_v7.csv
➜  ~ pachctl put-file mopng_beneficiary_v2 master -f /Users/aviralsrivastava/Downloads/ppac_master_v3_mmi_enriched_with_sanity_check.csv
➜  ~ pachctl put-file mopng_beneficiary_v2 master -f /Users/aviralsrivastava/Downloads/euoEQHIwIQTe1wXtg46fFYok.csv

It should be worth to note that I am getting this from the logs command(pachctl get-logs --pipeline=mopng-beneficiary-v2):

container "user" in pod "pipeline-mopng-beneficiary-v2-v1-lnbjh" is waiting to start: trying and failing to pull image

Upvotes: 0

Views: 278

Answers (1)

maths
maths

Reputation: 362

As Matthew L Daniel commented, the image name looks funny because it has no prefix. By default, Pachyderm pulls Docker images from Dockerhub, and Dockerhub prefixes images with the user that owns them (e.g. maths/mopng-beneficiary-v2-image)

Also, I think you might need to change the name of your input repo to be more distinct from the name of the pipeline. Pachyderm canonicalized repo names to meet Kubernetes naming requirements, and mopng-beneficiary-v2 and mopng_beneficiary_v2 might canonicalize to the same repo name (you might be getting an error like repo already exists). Try renaming the input repo to mopng_beneficiary_input or some such

Upvotes: 0

Related Questions