Reputation: 2074
I am trying to make a function that will pull the name of the column out of a dataframe schema. So what I have is the initial function defined:
val df = sqlContext.parquetFile(inputVal.toString)
val dfSchema = df.schema
def schemaMatchP(schema: StructType) : Map[String,List[Int]] =
schema
// get the 1st word (column type) in upper cases
.map(columnDescr => columnDescr
If I do something like this:
.map(columnDescr => columnDescr.toString.split(',')(0).toUpperCase)
I will get STRUCTFIELD(HH_CUST_GRP_MBRP_ID,BINARYTYPE,TRUE)
How do you handle a StructField so I can grab the 1st element out of each column for the schema. So my Column names: HH_CUST_GRP_MBRP_ID, etc...
Upvotes: 1
Views: 3133
Reputation: 67115
When in doubt look what the source does itself. DataFrame.toString has the answer :). StructField
is a case class with a name
property. So, just do:
schema.map(f => s"${f.name}")
Upvotes: 3