Junjie Hou
Junjie Hou

Reputation: 23

How to convert RDD[List[String]] to RDD[List[Float]]

For example, the structure of local file data.txt is:

1.0 2.0 3.0 4.0
5.0 6.0 7.0 8.0
       ...

Reading file as RDD[String]:

lines = sc.textFile("data.txt")

split into RDD[List[String]]:

data_temp = lines.map(lambda line: line.split(" "))

How to convert into RDD[List[Float]]?

I know json.loads() can parse the string, how to do in this case?

Upvotes: 0

Views: 1480

Answers (1)

Shaido
Shaido

Reputation: 28332

Simply convert all the strings to floats when splitting the lines:

data_temp = line.map(lambda line: [float(i) for i in line.split(" ")])

Or you can read the data as a dataframe and infer the types:

df = (spark.read
  .schema(schema)
  .option("header", "true")
  .option("inferSchema", "true")
  .csv("some_input_file.csv"))

For more information about different options when reading csv files, see here.

Upvotes: 1

Related Questions