user18934955
user18934955

Reputation: 49

Why is file field empty (array([], dtype=int8)) when loading dataset with mlx.data?

I'm using mlx.data to load an image dataset where each class is represented by a separate folder. My function files_and_classes generates a list of dictionaries containing file paths and corresponding labels.

Here’s my code:

from pathlib import Path
import mlx.data as dx

def files_and_classes(root: Path):
    """Load the files and classes from an image dataset that contains one folder per class."""
    images = list(root.rglob("*.jpg"))
    categories = [p.relative_to(root).parent.name for p in images]
    category_set = set(categories)
    category_map = {c: i for i, c in enumerate(sorted(category_set))}

    return [
        {
            "file": str(p.relative_to(root)).encode("ascii"),
            "label": category_map[c]
        }
        for c, p in zip(categories, images)
    ]

sample = files_and_classes(Path('/Users/kimduhyeon/Desktop/d2f/mlx/val'))
print(sample[0])

dset = dx.buffer_from_vector(sample)
print(dset[0])

Expected Output: I expected each entry in dset to contain the correct file path as a byte string and the corresponding label.

Actual Output: The first dictionary prints correctly from sample[0]:

{'file': b'film/000017270027.jpg', 'label': 1}

However, when accessing dset[0], the file field is empty:

{'label': array(1), 'file': array([], dtype=int8)}

Question: Why is the file field showing up as an empty array (array([], dtype=int8)) when converted to a mlx.data buffer? Is there a specific data type requirement for mlx.data.buffer_from_vector? How should I properly format the file field to avoid this issue?

Upvotes: 2

Views: 17

Answers (2)

user18934955
user18934955

Reputation: 49

There was a strange issue where the code worked in the global environment on my MacBook but not in a virtual environment. Based on this, I concluded that my Python environment and variables were tangled. So, I decided to reset my MacBook and reinstall everything properly, ensuring that I used only a single, correctly configured Python environment.

Following the installation method described in the official mlx-data documentation—cloning the repository via git clone and then binding it with Python—I was able to run everything without any issues.

I hope my experience can help others who might be struggling with similar problems.

Upvotes: 0

Kirill Ilichev
Kirill Ilichev

Reputation: 1289

Remove the .encode("ascii"), instead of encoding file path to bytes leave it as regular string.This waymlx.data.buffer_from_vectorcan automatically infer fixed length string type for file field

"file":str(p.relative_to(root))

Also you could do this:

"file":np.array(str(p.relative_to(root)), dtype='S40')

Upvotes: 0

Related Questions