chikitin
chikitin

Reputation: 781

Cannot CSV Load a file in Colab Using tf.compat.v1.keras.utils.get_file

I have mounted my GDrive and have csv file in a folder. I am following the tutorial. However, when I issue the tf.keras.utils.get_file(), I get a ValueError As follows.

data_folder = r"/content/drive/My Drive/NLP/project2/data"
import os
print(os.listdir(data_folder))

It returns:

['crowdsourced_labelled_dataset.csv',
 'P2_Testing_Dataset.csv',
 'P2_Training_Dataset_old.csv',
 'P2_Training_Dataset.csv']

TRAIN_DATA_URL = os.path.join(data_folder, 'P2_Training_Dataset.csv')
train_file_path = tf.compat.v1.keras.utils.get_file("train.csv", TRAIN_DATA_URL)

But this returns:

Downloading data from /content/drive/My Drive/NLP/project2/data/P2_Training_Dataset.csv
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-16-5bd642083471> in <module>()
      2 TRAIN_DATA_URL = os.path.join(data_folder, 'P2_Training_Dataset.csv')
      3 TEST_DATA_URL = os.path.join(data_folder, 'P2_Testing_Dataset.csv')
----> 4 train_file_path = tf.compat.v1.keras.utils.get_file("train.csv", TRAIN_DATA_URL)
      5 test_file_path = tf.compat.v1.keras.utils.get_file("eval.csv", TEST_DATA_URL)


6 frames
/usr/lib/python3.6/urllib/request.py in _parse(self)
    382         self.type, rest = splittype(self._full_url)
    383         if self.type is None:
--> 384             raise ValueError("unknown url type: %r" % self.full_url)
    385         self.host, self.selector = splithost(rest)
    386         if self.host:

ValueError: unknown url type: '/content/drive/My Drive/NLP/project2/data/P2_Training_Dataset.csv'

What am I doing wrong please?

Upvotes: 2

Views: 1086

Answers (1)

akilat90
akilat90

Reputation: 5696

As per the docs, this will be the outcome of a call to the function tf.compat.v1.keras.utils.get_file.

tf.keras.utils.get_file(
    fname,
    origin,
    untar=False,
    md5_hash=None,
    file_hash=None,
    cache_subdir='datasets',
    hash_algorithm='auto',
    extract=False,
    archive_format='auto',
    cache_dir=None
)

By default the file at the url origin is downloaded to the cache_dir ~/.keras, placed in the cache_subdir datasets, and given the filename fname. The final location of a file example.txt would therefore be ~/.keras/datasets/example.txt.

Returns: Path to the downloaded file

Since you already have the data in your drive, there's no need to download it again (and IIUC, the function is expecting an accessible URL). Also, there's no need of obtaining the file name from a function call because you already know it.

Assuming the drive is mounted, you can replace your file paths as below:

train_file_path = os.path.join(data_folder, 'P2_Training_Dataset.csv')
test_file_path = os.path.join(data_folder, 'P2_Testing_Dataset.csv')

Upvotes: 3

Related Questions