user2145299
user2145299

Reputation: 91

Data type in Spark is not right

When I create a data frame in spark, the columns are of the wrong type. I have hundreds of columns and don't know how to best change the data type of each column. Fortunately most of are supposed to be numeric.

Here is what I do:

val df = sc.textFile("user/name/testC.tsv")
# Removing the first line.
val dfLines = df.filter(x => !x.contains("test_name")
# I am picking columns I want.
val rowRDD = df.lines.map( x => x.split("\t")).map (x(2), x(4), x(11), x(12)))
# Creating a data frame.
val df = rowRDD.toDF("cycle", "dut", "metric1", "metric2")

The columns are supposed to be numeric, but df has only strings:

(String, String, String, String, String, String, String, String, String, String, String, String, String) =
  (100,0,255,34,33,25,29,32,26,44,31,0,UP) 

Upvotes: 0

Views: 189

Answers (1)

Daniel Darabos
Daniel Darabos

Reputation: 27456

When you pick the columns, you can perform conversions. For example:

val rowRDD = df.lines
  .map(x => x.split("\t"))
  .map((x(2).toInt, x(4), x(11).toDouble, x(12).toDouble))

(Assuming cycle is an integer, dut is a string, and metric1 and metric2 are real numbers.)

Upvotes: 2

Related Questions