Reputation: 8711
I use below code in Spark-scala to get the partitioned columns.
scala> val part_cols= spark.sql(" describe extended work.quality_stat ").select("col_name").as[String].collect()
part_cols: Array[String] = Array(x_bar, p1, p5, p50, p90, p95, p99, x_id, y_id, # Partition Information, # col_name, x_id, y_id, "", # Detailed Table Information, Database, Table, Owner, Created Time, Last Access, Created By, Type, Provider, Table Properties, Location, Serde Library, InputFormat, OutputFormat, Storage Properties, Partition Provider)
scala> part_cols.takeWhile( x => x.length()!= 0 ).reverse.takeWhile( x => x != "# col_name" )
res20: Array[String] = Array(x_id, y_id)
and I need to get similar output in Python. I'm struggling to replicate the same code in Python for the Array Operation to get the [y_id, x_id].
Below is what I tried.
>>> part_cols=spark.sql(" describe extended work.quality_stat ").select("col_name").collect()
Is it possible using Python.
Upvotes: 2
Views: 57
Reputation: 14845
part_cols
in the question is an array of rows. So the first step is to convert it into an array of strings.
part_cols = spark.sql(...).select("col_name").collect()
part_cols = [row['col_name'] for row in part_cols]
Now the start and end of the array's part that you are interessted in can be calculated with
start_index = part_cols.index("# col_name") + 1
end_index = part_cols.index('', start_index)
Finally a slice can be extracted from the list with these two values as start and end
part_cols[start_index:end_index]
This slice will contain the values
['x_id', 'y_id']
If the output really should be reversed, the slice
part_cols[end_index-1:start_index-1:-1]
will contain the values
['y_id', 'x_id']
Upvotes: 1