Error loading .csv into tensorflow

Question

I've taken the prefabricated code that trains on the Iris csv and attempted to use my own csv.

The error is occurring here

train_data = "train_data.csv"
test_data = "test_data.csv"

training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
filename=train_data,
target_dtype=np.int,
features_dtype=np.float32)

with the error

ValueError: invalid literal for int() with base 10: 'feature1'

the csv looks like this

feature1,feature2,feature3,label
1028.0,1012.0,1014.0,1
1029.0,1011.0,1017.0,-1
1027.0,1013.0,1015.0,1
...(and so on)

I get that the error is trying to say that feature1 is not an integer. However, when I use the same code for the Iris dataset, there are string headers that are not used as tensors. The Iris data csv looks like this.

30,4,setosa,versicolor,virginica
5.9,3.0,4.2,1.5,1
6.9,3.1,5.4,2.1,2
5.1,3.3,1.7,0.5,0

Also, not sure if I should make this a different question, but I changed the feature headers to

1,2,3,4
1028.0,1012.0,1014.0,1
1029.0,1011.0,1017.0,-1
1027.0,1013.0,1015.0,1
...(and so on)

and am now getting this error

ValueError: could not broadcast input array from shape (3) into shape (2)

Any ideas or help are greatly appreciated! Thanks!!!

gntoni · Accepted Answer

If you are going to use this function, you have to write the dataset in the expected format. The first row should be like:

n_samples, n_features, [feature names]

For example, the one for the iris dataset you are showing has the correct format:

30,4,setosa,versicolor,virginica

i.e. 30 samples 4 features

If you have 50 samples in the dataset you created it should be like:

50,4,labelname
1028.0,1012.0,1014.0,1
1029.0,1011.0,1017.0,-1
1027.0,1013.0,1015.0,1
...(and so on)

Error loading .csv into tensorflow

Answers (1)

Related Questions