Reputation: 3
The docker image that I built runs a python script that scrapes data from a website and outputs them into a json file in the same directory.
When running the python script by itself, the json file is outputted successfully.
However, the issue that I seem to be having now is that whenever I try running the docker image with the line below: -
docker run -v ${pwd}:/data python-zaws
The script is successfully run but the json file is not outputted in the host directory.
I have also used tried searching the output in the container itself via: -
docker run -it -d -v ${pwd}:/data python-zaws bash
docker attach <container-name>
ls
But it seems like the json file is not found in the container as well.
I have tried searching all over Google and am not able to find any solutions on this.
Any help would really be appreciated.
Codes as provided below: -
My Dockerfile:
FROM python:3.6.8
WORKDIR /mydata
COPY . /mydata
CMD ["python", "-u", "./webscraper.py"]
Python Code:
# imports required modules
from urllib.request import Request, urlopen;
import json;
# counter (e.g. 0 = page 1, 102 = page 2, 204 = page 3, and so on...)
offsetCounter = 0;
# creates empty dictionary to store data in
shoeDictionary = {'Brand':[], 'Name':[], 'Actual Price':[], 'Discounted Price':[], 'Image Link':[], 'Alternate Image Link':[]};
while True:
# XHR url containing required data
url = '...';
# makes a GET request to the url and opens a connection
request = Request(url, headers={'User-Agent': 'XYZ/3.0'});
connection = urlopen(request);
# translates JSON data into a Python dictionary
jsonData = json.loads(connection.read());
if len(jsonData['response']['docs']) > 0:
for i in range(len(jsonData['response']['docs'])): # iterates over entire json object
shoeDictionary['Brand'].append(jsonData['response']['docs'][i]['meta']['brand']);
shoeDictionary['Name'].append(jsonData['response']['docs'][i]['meta']['name']);
shoeDictionary['Actual Price'].append(jsonData['response']['docs'][i]['meta']['price']);
shoeDictionary['Discounted Price'].append(jsonData['response']['docs'][i]['meta']['special_price']);
shoeDictionary['Image Link'].append(jsonData['response']['docs'][i]['image']);
shoeDictionary['Alternate Image Link'].append(jsonData['response']['docs'][i]['image_extra'][0]['url']);
i = i + 1;
offsetCounter = offsetCounter + 102; # adds 102 to offset counter
continue;
else: # if len(jsonData['response']['docs'] = 0, then break the loop
break;
# writes into json files
with open('data.json', 'w', encoding='utf-8') as outfile:
json.dump(shoeDictionary, outfile, ensure_ascii=False, indent=4);
# test
print("Does this mean the script worked?");
# starts application
if __name__ == "__webscraper__":
main();
Upvotes: 0
Views: 1008
Reputation: 1620
As you can see from the Dockerfile, your working directory is /mydata
. and your running command mounts volume into different path ${pwd}:/data
which means /data/
folder in the container.
I am expecting that app is putting JSON into current working directory, therefore in /mydata
folder. Could you try to set output path manually into shared volume, or either change the path of the volume?
Also, when you are using bash to run container, at that point you haven't probably executed script yet, so the JSON file does not exist yet and you could not find it.
Each time new container is created with docker run
and you are unable to find older files from previous runs.
Upvotes: 1