theMadKing
theMadKing

Reputation: 2074

Pulling Name Out of Schema in Spark DataFrame

I am trying to make a function that will pull the name of the column out of a dataframe schema. So what I have is the initial function defined:

  val df = sqlContext.parquetFile(inputVal.toString)
  val dfSchema = df.schema
def schemaMatchP(schema: StructType) : Map[String,List[Int]] =
      schema
        // get the 1st word (column type) in upper cases
        .map(columnDescr => columnDescr

If I do something like this:

.map(columnDescr => columnDescr.toString.split(',')(0).toUpperCase)

I will get STRUCTFIELD(HH_CUST_GRP_MBRP_ID,BINARYTYPE,TRUE)

How do you handle a StructField so I can grab the 1st element out of each column for the schema. So my Column names: HH_CUST_GRP_MBRP_ID, etc...

Upvotes: 1

Views: 3133

Answers (1)

Justin Pihony
Justin Pihony

Reputation: 67115

When in doubt look what the source does itself. DataFrame.toString has the answer :). StructField is a case class with a name property. So, just do:

schema.map(f => s"${f.name}")

Upvotes: 3

Related Questions