Andrew
Andrew

Reputation: 3

Docker: No output file generated in container

The docker image that I built runs a python script that scrapes data from a website and outputs them into a json file in the same directory.

When running the python script by itself, the json file is outputted successfully.

However, the issue that I seem to be having now is that whenever I try running the docker image with the line below: -

docker run -v ${pwd}:/data python-zaws

The script is successfully run but the json file is not outputted in the host directory.

I have also used tried searching the output in the container itself via: -

docker run -it -d -v ${pwd}:/data python-zaws bash
docker attach <container-name>
ls

But it seems like the json file is not found in the container as well.

I have tried searching all over Google and am not able to find any solutions on this.

Any help would really be appreciated.

Codes as provided below: -

My Dockerfile:

FROM python:3.6.8

WORKDIR /mydata

COPY . /mydata

CMD ["python", "-u", "./webscraper.py"]

Python Code:

# imports required modules
from urllib.request import Request, urlopen;
import json;

# counter (e.g. 0 = page 1, 102 = page 2, 204 = page 3, and so on...)
offsetCounter = 0;

# creates empty dictionary to store data in
shoeDictionary = {'Brand':[], 'Name':[], 'Actual Price':[], 'Discounted Price':[], 'Image Link':[], 'Alternate Image Link':[]};

while True:
    # XHR url containing required data
    url = '...';

    # makes a GET request to the url and opens a connection
    request = Request(url, headers={'User-Agent': 'XYZ/3.0'});
    connection = urlopen(request);

    # translates JSON data into a Python dictionary
    jsonData = json.loads(connection.read());

    if len(jsonData['response']['docs']) > 0:
        for i in range(len(jsonData['response']['docs'])): # iterates over entire json object

            shoeDictionary['Brand'].append(jsonData['response']['docs'][i]['meta']['brand']);
            shoeDictionary['Name'].append(jsonData['response']['docs'][i]['meta']['name']);
            shoeDictionary['Actual Price'].append(jsonData['response']['docs'][i]['meta']['price']);
            shoeDictionary['Discounted Price'].append(jsonData['response']['docs'][i]['meta']['special_price']);
            shoeDictionary['Image Link'].append(jsonData['response']['docs'][i]['image']);
            shoeDictionary['Alternate Image Link'].append(jsonData['response']['docs'][i]['image_extra'][0]['url']);

            i = i + 1;
        
        offsetCounter = offsetCounter + 102; # adds 102 to offset counter

        continue;
    else: # if len(jsonData['response']['docs'] = 0, then break the loop
        break;

# writes into json files
with open('data.json', 'w', encoding='utf-8') as outfile:
    json.dump(shoeDictionary, outfile, ensure_ascii=False, indent=4);

# test
print("Does this mean the script worked?");

# starts application
if __name__ == "__webscraper__":
    main();

Upvotes: 0

Views: 1008

Answers (1)

Niklas
Niklas

Reputation: 1620

As you can see from the Dockerfile, your working directory is /mydata. and your running command mounts volume into different path ${pwd}:/data which means /data/ folder in the container. I am expecting that app is putting JSON into current working directory, therefore in /mydata folder. Could you try to set output path manually into shared volume, or either change the path of the volume?

Also, when you are using bash to run container, at that point you haven't probably executed script yet, so the JSON file does not exist yet and you could not find it. Each time new container is created with docker run and you are unable to find older files from previous runs.

Upvotes: 1

Related Questions