Reputation: 11
i have a pyspark dataframe with two columns interval
and dataMap
>> df.show()
interval dataMap
0 {0 -> 10, 1 -> 6}
1 {}
2 {2 -> 8, 3 -> 7}
3 {3 -> 9}
Now i want to add 1 more column to show whether column interval is a key in column dataMap column. The expected result like this:
>> df.show()
interval dataMap exist
0 {0 -> 10, 1 -> 6} true
1 {} false
2 {2 -> 8, 3 -> 7} true
3 {3 -> 9} true
How should we achieve that? Do we have any function like:
df = (
df.withColumn("exist",
when(col("interval").isin(col("dataMap")),False)
.otherwise(True)
)
)
thanks
Upvotes: 1
Views: 555
Reputation: 2436
df = spark.createDataFrame(
[
('0','{0 -> 10, 1 -> 6}'),
('1','{}'),
('2','{2 -> 8, 3 -> 7}'),
('3','{3 -> 9}')
], ['interval','dataMap']
)
from pyspark.sql import functions as F, types as T
from pyspark.sql.functions import udf
def valida(x, dic):
return x in dic
valida_udf = udf(valida, T.BooleanType())
df\
.withColumn('dic', F.regexp_replace('dataMap', '->', ':'))\
.withColumn('exist', valida_udf('interval', 'dic'))\
.drop('dic')\
.show(truncate=False)
+--------+-----------------+-----+
|interval|dataMap |exist|
+--------+-----------------+-----+
|0 |{0 -> 10, 1 -> 6}|true |
|1 |{} |false|
|2 |{2 -> 8, 3 -> 7} |true |
|3 |{3 -> 9} |true |
+--------+-----------------+-----+
Upvotes: 0
Reputation: 3686
You can use array_contains
and map_keys
to do so.
from pyspark.sql.functions import map_keys, col, array_contains
df = spark.sql(
'''
SELECT 1 as interval, map(1, 'a', 2, 'b') as dataMap
UNION ALL
SELECT 3 as interval, map(1, 'a', 2, 'b') as dataMap
''')
df = df.withColumn("exist", array_contains(map_keys(col('dataMap')), col("interval")))
df.show()
+--------+----------------+-----+
|interval| dataMap|exist|
+--------+----------------+-----+
| 1|{1 -> a, 2 -> b}| true|
| 3|{1 -> a, 2 -> b}|false|
+--------+----------------+-----+
Upvotes: 3