Reputation: 945
Is there a way to get only the columns names from a parquet file in unix shell script as similar to below,
scala> df.columns
res3: Array[String] = Array(id,name, department,address,country)
In shell script I want this to be the value of variable COLUMNS="id,name, department,address,country"
I can then send this information to the sqoop export command. The parquet files do contain various columns, but they are all exported to the same table, so I can't use static columns.
Upvotes: 2
Views: 277
Reputation: 130
Usage in Hadoop Prints out the schema for a given parquet file.
hadoop jar parquet-tools-1.9.0.jar schema hdfs://localhost:8020/test1.parquet
need parquet-tools-1.9.0.jar to attached and it will show you the columns.
Upvotes: 2