Nipun
Nipun

Reputation: 4319

Selecting map key as column in dataframe in spark

I have a dataframe from cassandrasql and I have a column which is a map in dataframe like

scala> df.printSchema
root
 |-- client: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

I need to select some columns from df as well a particular key from a map as column in df, instead of complete map

Let say I have a map of key1 -> value1 key2 -> value2 ....

I need to select only key1 from map in the dataframe to be a column in my new dataframe. How can I do that

Also I am using cassandrasqlcontext.sql to get the dataframe.

Upvotes: 3

Views: 8130

Answers (3)

s510
s510

Reputation: 2832

Try this in spark sql:

select map_filter(your_map_name, (k,v) -> k == 'desired_key) from spark_table

This will give you the entire key:value as output. If you want only the value, try the below instead :

select map_values(map_filter(your_map_name, (k,v) -> k == 'desired_key)) from spark_table

Upvotes: 0

user12278954
user12278954

Reputation: 11

Assuming Spark2 and pyspark, this worked for me:

SparkSQL:

df.registerTempTable("table_name")
spark.sql("select client.key1 from table_name")
spark.sql("select client.key1, client.key2 from table_name")

using dataframes (df):

df.select("client.key1").show()
df.select("client.key1", "client.key2").show()

Upvotes: 1

Arnon Rotem-Gal-Oz
Arnon Rotem-Gal-Oz

Reputation: 25929

Using SparkSQL (assuming you registed the dataframe as "df")

context.registerDataFrameAsTable(df,"df")
val newDf =context.sql("select client.key,client.value from df where client.key='some value'")

Upvotes: 2

Related Questions