Makar Nikitin
Makar Nikitin

Reputation: 329

How to map values by column names at pyspark

What do i want - is to map column names into keys. For example:

#+-------+----------+
#|key1   |key2      |
#+-------+----------+
#|value1 |value2    |
#|value3 |value4    |
#+-------+----------+

Will be transformed to

#+-------+----------+
#|   keys|values    |
#+-------+----------+
#|key1   |value1    |
#|key1   |value2    |
#|key2   |value3    |
#|key2   |value4    |
#+-------+----------+

In HiveQL i can wrote something similar to

select distinct key, velue
    from xxx
    lateral view explode(map(
            'key1', key1,
            'key2', key2) tab as key, value

But how to wrote it at pyspark. I can use createtemptable but i think it's not best solution/

Upvotes: 1

Views: 2715

Answers (2)

blackbishop
blackbishop

Reputation: 32680

Use create_map function to create a Map column and then explode it.

create_map needs a list of column expressions that are grouped as key-value pairs. Such a list can be created using for comprehension on DataFrame columns:

from itertools import chain
from pyspark.sql.functions import col, lit, create_map, explode


data = [("value1", "value2"), ("value3", "value4")]
df = spark.createDataFrame(data, ["key1", "key2"])

key_values = create_map(*(chain(*[(lit(name), col(name)) for name in df.columns])))

df.select(explode(key_values)).show()

+----+------+
| key| value|
+----+------+
|key1|value1|
|key2|value2|
|key1|value3|
|key2|value4|
+----+------+

Upvotes: 6

Steven
Steven

Reputation: 15273

Something like that ?

select 'key1' as keys,
       key1 as values
from xxx
union all 
select 'key2' as keys,
       key2 as values
from xxx

put that is spark.sql().

Upvotes: 1

Related Questions