Reputation: 49
I'm using mlx.data to load an image dataset where each class is represented by a separate folder. My function files_and_classes generates a list of dictionaries containing file paths and corresponding labels.
Here’s my code:
from pathlib import Path
import mlx.data as dx
def files_and_classes(root: Path):
"""Load the files and classes from an image dataset that contains one folder per class."""
images = list(root.rglob("*.jpg"))
categories = [p.relative_to(root).parent.name for p in images]
category_set = set(categories)
category_map = {c: i for i, c in enumerate(sorted(category_set))}
return [
{
"file": str(p.relative_to(root)).encode("ascii"),
"label": category_map[c]
}
for c, p in zip(categories, images)
]
sample = files_and_classes(Path('/Users/kimduhyeon/Desktop/d2f/mlx/val'))
print(sample[0])
dset = dx.buffer_from_vector(sample)
print(dset[0])
Expected Output: I expected each entry in dset to contain the correct file path as a byte string and the corresponding label.
Actual Output: The first dictionary prints correctly from sample[0]:
{'file': b'film/000017270027.jpg', 'label': 1}
However, when accessing dset[0], the file field is empty:
{'label': array(1), 'file': array([], dtype=int8)}
Question: Why is the file field showing up as an empty array (array([], dtype=int8)) when converted to a mlx.data buffer? Is there a specific data type requirement for mlx.data.buffer_from_vector? How should I properly format the file field to avoid this issue?
Upvotes: 2
Views: 17
Reputation: 49
There was a strange issue where the code worked in the global environment on my MacBook but not in a virtual environment. Based on this, I concluded that my Python environment and variables were tangled. So, I decided to reset my MacBook and reinstall everything properly, ensuring that I used only a single, correctly configured Python environment.
Following the installation method described in the official mlx-data
documentation—cloning the repository via git clone
and then binding it with Python—I was able to run everything without any issues.
I hope my experience can help others who might be struggling with similar problems.
Upvotes: 0
Reputation: 1289
Remove the .encode("ascii")
, instead of encoding file path to bytes leave it as regular string.This waymlx.data.buffer_from_vector
can automatically infer fixed length string type for file field
"file":str(p.relative_to(root))
Also you could do this:
"file":np.array(str(p.relative_to(root)), dtype='S40')
Upvotes: 0