madsthaks
madsthaks

Reputation: 2181

Struggling to understand why float() is giving this error

Here is a piece of the dataset:

18,8,307,130,3504,12,70,1,chevrolet
15,8,350,165,3693,11.5,70,1,buick
18,8,318,150,3436,11,70,1,plymouth
16,8,304,150,3433,12,70,1,amc
17,8,302,140,3449,10.5,70,1,ford
15,8,429,198,4341,10,70,1,ford
14,8,454,220,4354,9,70,1,chevrolet
14,8,440,215,4312,8.5,70,1,plymouth

Here is the code:

data = sc.textFile("hw6/auto_mpg_original.csv")
records = data.map(lambda x: x.split(","))

hp = float(records.map(lambda x: x[3]))
disp = np.array(float(records.map(lambda x: x[2])))

final_data_1 = LabeledPoint(hp, disp)

Here is the error:

Traceback (most recent call last):
  File "/home/cloudera/Desktop/hw6.py", line 41, in <module>
    hp = float(records.map(lambda x: x[3]))
TypeError: float() argument must be a string or a number

This seems basic, but i'm really having trouble tracking down a solution to this.

Upvotes: 0

Views: 83

Answers (2)

AChampion
AChampion

Reputation: 30288

Check the type of records.map() probably an RDD. You can apply the float() in the map(), e.g.:

hp = records.map(lambda x: float(x[3]))

But you will need to .collect() the results before using it, e.g.:

hp = records.map(lambda x: float(x[3])).collect()
disp = np.array(records.map(lambda x: float(x[2])).collect())

Upvotes: 1

Sarvex
Sarvex

Reputation: 785

There is a problem with the input from the CSV, the column is either empty or containing non numeric value

Upvotes: 0

Related Questions