hildebro
hildebro

Reputation: 569

Parsing CSV into Pytorch tensors

I have a CSV files with all numeric values except the header row. When trying to build tensors, I get the following exception:

Traceback (most recent call last):
  File "pytorch.py", line 14, in <module>
    test_tensor = torch.tensor(test)
ValueError: could not determine the shape of object type 'DataFrame'

This is my code:

import torch
import dask.dataframe as dd

device = torch.device("cuda:0")

print("Loading CSV...")
test = dd.read_csv("test.csv", encoding = "UTF-8")
train = dd.read_csv("train.csv", encoding = "UTF-8")

print("Converting to Tensor...")
test_tensor = torch.tensor(test)
train_tensor = torch.tensor(train)

Using pandas instead of Dask for CSV parsing produced the same error. I also tried to specify dtype=torch.float64 inside the call to torch.tensor(data), but got the same error again.

Upvotes: 22

Views: 60274

Answers (5)

IDawson
IDawson

Reputation: 1

The import functions all appear to require a .csv with an array of numbers. You mentioned in your original problem case that your .csv includes column headers. Please try your code without the headers in the .csv file.

Upvotes: 0

alercelik
alercelik

Reputation: 723

Only using NumPy

import numpy as np
import torch

tensor = torch.from_numpy(
    np.genfromtxt("train.csv", delimiter=",")
)

Upvotes: 1

Dishin H Goyani
Dishin H Goyani

Reputation: 7693

Newer version of pandas highly recommend to use to_numpy instead of values

train_tensor = torch.tensor(train.to_numpy())

Upvotes: 7

Arash
Arash

Reputation: 552

I think you're just missing .values

import torch
import pandas as pd

train = pd.read_csv('train.csv')
train_tensor = torch.tensor(train.values)

Upvotes: 14

karla
karla

Reputation: 389

Try converting it to an array first:

test_tensor = torch.Tensor(test.values)

Upvotes: 23

Related Questions