MTT
MTT

Reputation: 5263

collapse a wrappedarray elements into different columns

I have a spark dataframe and here is the schema:

|-- eid: long (nullable = true)
|-- age: long (nullable = true)
|-- sex: long (nullable = true)
|-- father: array (nullable = true)
|    |-- element: array (containsNull = true)
|    |    |-- element: long (containsNull = true)

and a sample of rows:.

df.select(df['father']).show()
+--------------------+
|              father|
+--------------------+
|[WrappedArray(-17...|
|[WrappedArray(-11...|
|[WrappedArray(13,...|
+--------------------+

and the type is

DataFrame[father: array<array<bigint>>]

What I want is collapsing the father column says for example if 13 is a member of this array, create a new column and return 1, otherwise return 0 Here is the first thing I tried:

def modify_values(r):
    if 13 in r:
        return 1
    else:
        return 0

my_udf = udf(modify_values, IntegerType())
df.withColumn("new_col",my_udf(df['father'].getItem(0))).show()

and it return this error:

Py4JJavaError: An error occurred while calling o817.showString.
TypeError: argument of type 'NoneType' is not iterable

and then I tried this one:

df.withColumn("new_col", F.when(1 in df["father"].getItem(0), 1).otherwise(0))

and the complain is:

ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

Upvotes: 0

Views: 492

Answers (1)

Ramesh Maharjan
Ramesh Maharjan

Reputation: 41957

Looking at the schema of your dataframe, simple combination of when and array_contains functions should solve your issue

df.withColumn("new_col", when(array_contains($"father"(0), 13), 1).otherwise(0)).show(false)

If you want to still try with udf function which would be slower approach than above way, you should change your udf function as below

def my_udf = udf((array: mutable.WrappedArray[Int]) => array match{
  case x if(x.contains(13)) => 1
  case _ => 0
})

df.withColumn("new_col", my_udf($"father"(0))).show(false)

I hope this answer solves all of your issues

Upvotes: 1

Related Questions