Reputation: 161
I'm trying to download data from Kaggle Competition state-farm-distracted-driver-detection
The dataset has following directory structure
|-driver_imgs_list.csv
|-sample-submission.csv
|imgs
| |test
| |train
| |c0
| |c1
| |c2
| |-img_100029.jpg
| |-img_100108.jpg
I want only imgs/train/c2 folder to download. I know how to download full dataset and particular files, But I'm unable to figure out How to download a particular folder using the API
Initially I have tried using Kaggle CLI API, Using that I'm able to download particular image as follows
kaggle competitions download state-farm-distracted-driver-detection -f imgs/train/c2/img_100029.jpg
But when I tried the following command to download the c2 folder, I'm getting error like File not found
kaggle competitions download state-farm-distracted-driver-detection -f imgs/train/c2
404 - Not Found
Is there any Command To download a Particular folder from competition in kaggle api??
As another trial I used Kaggle API from python to download that folder
My idea is, There is a file named "driver_imgs_list.csv" which contains class names like (c0,c1,c2..) along with their corresponding image files. As I want to download c2 class folder, I stored the c2 class image files in an array using pandas. Then I tried to download the each file using a for loop as follows
from kaggle.api.kaggle_api_extended import KaggleApi
import pandas as pd
api = KaggleApi()
api.authenticate()
data = pd.read_csv("driver_imgs_list.csv")
images = data[data["classname"] == "c2"]["img"] #It will give me all image file names under c2 folder
imgArray=[]
for i in images:
imgArray.append(i)
for i in imgArray:
file = "imgs/train/c2/{i}".format(i=i)
api.competition_download_file('state-farm-distracted-driver-detection',file,quiet = False,force = True)
Even By using the above Code I'm getting the same error as file not found as follows
HTTP response body: b'{"code":404,"message":"NotFound"}'
How can I Download a Particular folder either using Kaggle CLI API or from python
Upvotes: 8
Views: 2825
Reputation: 5741
Could it be that the error message is true, and that the file is truly not in the dataset's folder?
Another idea is that it has to do with the order (?), because I was able to get your code running when using .sort_values()
on the image names' Series
:
data = pd.read_csv('driver_imgs_list.csv')
filenames = 'imgs/train/c2/' + data[data['classname'] == 'c2']['img'].sort_values()
for filename in filenames:
api.competition_download_file('state-farm-distracted-driver-detection', filename)
However, I only let it run for like 10 files. So again it could be that there is a mismatch between the files in the CSV file and the files actually available in the dataset.
Upvotes: 1