Reputation: 475
I have a spark DataFrame with a column named "Ingredients". It has some values like:
['banana', 'apple']
['meat']
[]
[]
I want to look at only the []. Tried this:
display(df.filter(df.ingredients == []))
But got error:
java.lang.RuntimeException: Unsupported literal type class java.util.ArrayList []
Upvotes: 2
Views: 4454
Reputation: 666
try define like this code
import pyspark.sql.functions as F
import pyspark.sql.types as T
df = df.withColumn("ids",F.lit(None).astype(T.ArrayType(T.StringType())))
the ids will stored as None
the ids
's dtype is array<string>
and query with spark-sql
like
select * from tb1 where ids is not null
Upvotes: 0
Reputation: 32670
Adding further to to @mck's answer, sometimes you have an array which contains only one empty string and it is also shown like 'empty array'. Here's an example :
df = spark.createDataFrame([([''],)], ['value'])
df.show()
# +-----+
# |value|
# +-----+
# | []|
# +-----+
df.filter(F.col("value") == F.array(F.lit(""))).show()
# +-----+
# |value|
# +-----+
# | []|
# +-----+
df.filter(F.col("value") != F.array(F.lit(""))).show()
# +-----+
# |value|
# +-----+
# +-----+
In this case F.col("value") == F.array()
won't work.
Upvotes: 2
Reputation: 42352
You can specify an empty array to compare:
import pyspark.sql.functions as F
display(df.filter(df.ingredients == F.array()))
Or you can check the array length is zero:
display(df.filter(F.size(df.ingredients) == 0))
Upvotes: 3