Reputation: 23
For example, the structure of local file data.txt is:
1.0 2.0 3.0 4.0
5.0 6.0 7.0 8.0
...
Reading file as RDD[String]:
lines = sc.textFile("data.txt")
split into RDD[List[String]]:
data_temp = lines.map(lambda line: line.split(" "))
How to convert into RDD[List[Float]]?
I know json.loads()
can parse the string, how to do in this case?
Upvotes: 0
Views: 1480
Reputation: 28332
Simply convert all the strings to floats when splitting the lines:
data_temp = line.map(lambda line: [float(i) for i in line.split(" ")])
Or you can read the data as a dataframe and infer the types:
df = (spark.read
.schema(schema)
.option("header", "true")
.option("inferSchema", "true")
.csv("some_input_file.csv"))
For more information about different options when reading csv files, see here.
Upvotes: 1