Reputation: 91
When I create a data frame in spark, the columns are of the wrong type. I have hundreds of columns and don't know how to best change the data type of each column. Fortunately most of are supposed to be numeric.
Here is what I do:
val df = sc.textFile("user/name/testC.tsv")
# Removing the first line.
val dfLines = df.filter(x => !x.contains("test_name")
# I am picking columns I want.
val rowRDD = df.lines.map( x => x.split("\t")).map (x(2), x(4), x(11), x(12)))
# Creating a data frame.
val df = rowRDD.toDF("cycle", "dut", "metric1", "metric2")
The columns are supposed to be numeric, but df
has only strings:
(String, String, String, String, String, String, String, String, String, String, String, String, String) =
(100,0,255,34,33,25,29,32,26,44,31,0,UP)
Upvotes: 0
Views: 189
Reputation: 27456
When you pick the columns, you can perform conversions. For example:
val rowRDD = df.lines
.map(x => x.split("\t"))
.map((x(2).toInt, x(4), x(11).toDouble, x(12).toDouble))
(Assuming cycle
is an integer, dut
is a string, and metric1
and metric2
are real numbers.)
Upvotes: 2