John Humanyun
John Humanyun

Reputation: 945

Get only the columns names from hdfs parquet file in shell script

Is there a way to get only the columns names from a parquet file in unix shell script as similar to below,

scala> df.columns
res3: Array[String] = Array(id,name, department,address,country)

In shell script I want this to be the value of variable COLUMNS="id,name, department,address,country"

I can then send this information to the sqoop export command. The parquet files do contain various columns, but they are all exported to the same table, so I can't use static columns.

Upvotes: 2

Views: 277

Answers (1)

Shah
Shah

Reputation: 130

Usage in Hadoop Prints out the schema for a given parquet file.

hadoop jar parquet-tools-1.9.0.jar schema hdfs://localhost:8020/test1.parquet

need parquet-tools-1.9.0.jar to attached and it will show you the columns.

Upvotes: 2

Related Questions