python cut between partitioned column results

Question

I use below code in Spark-scala to get the partitioned columns.

scala> val part_cols= spark.sql(" describe extended work.quality_stat ").select("col_name").as[String].collect()
part_cols: Array[String] = Array(x_bar, p1, p5, p50, p90, p95, p99, x_id, y_id, # Partition Information, # col_name, x_id, y_id, "", # Detailed Table Information, Database, Table, Owner, Created Time, Last Access, Created By, Type, Provider, Table Properties, Location, Serde Library, InputFormat, OutputFormat, Storage Properties, Partition Provider)

scala> part_cols.takeWhile( x => x.length()!= 0 ).reverse.takeWhile( x => x != "# col_name" )
res20: Array[String] = Array(x_id, y_id)

and I need to get similar output in Python. I'm struggling to replicate the same code in Python for the Array Operation to get the [y_id, x_id].

Below is what I tried.

>>> part_cols=spark.sql(" describe extended work.quality_stat ").select("col_name").collect()

Is it possible using Python.

werner · Accepted Answer

part_cols in the question is an array of rows. So the first step is to convert it into an array of strings.

part_cols = spark.sql(...).select("col_name").collect()
part_cols = [row['col_name'] for row in part_cols]

Now the start and end of the array's part that you are interessted in can be calculated with

start_index = part_cols.index("# col_name") + 1
end_index = part_cols.index('', start_index)

Finally a slice can be extracted from the list with these two values as start and end

part_cols[start_index:end_index]

This slice will contain the values

['x_id', 'y_id']

If the output really should be reversed, the slice

part_cols[end_index-1:start_index-1:-1]

will contain the values

['y_id', 'x_id']

python cut between partitioned column results

Answers (1)

Related Questions