nareshbabral
nareshbabral

Reputation: 831

Change column types iteratively Spark data frames

I have a list of column names in Scala like

var cols = List("col1", "col2", "col3","col4")

Also I have a data frame with these columns but all in string. Now I would like to cast columns of dataframe by iterating through list or columns of dataframe because my list of columns is very large and I cant afford using so many .withColumn arguments

Thanks in Advance

Upvotes: 3

Views: 2113

Answers (2)

Ravi
Ravi

Reputation: 137

In case if you want to change multiple columns of a specific type to another without specifying individual column names. I have posted my answer here https://stackoverflow.com/a/60552157/3351492

Upvotes: 0

zero323
zero323

Reputation: 330063

If you know output types upfront it simply a matter of mapping over the columns with something similar to this

val df = sc.parallelize(Seq(
  ("foo", "1.0", "2", "true"),
  ("bar", "-1.0", "5", "false")
)).toDF("v", "x", "y", "z")

val types = Seq(
  ("v", "string"), ("x", "double"), ("y", "bigint"), ("z", "boolean")
)

df.select(types.map{case (c, t) => col(c).cast(t)}: _*)

If you don't know the types problem is much more trickier. While it is possible to create custom parser which can handle schema inference it probably make more sense to fix upstream pipeline instead. What is the point of using Avro when you ignore data types.

Upvotes: 5

Related Questions