Robin
Robin

Reputation: 695

how handle different type in column with dataframe

I have a table like below,

+----------+-----+
|       tmp|index|
+----------+-----+
| [user1,0]|    0|
| [user1,3]|    1|
|[user1,15]|    2|

I want split tmp column to tow columns. tmp is String type, index is Int.

I write the udf as below.

val getUser_id = udf( ( s : (String, Int)) => {
  s._1
})
newSession.withColumn( "user_id", getUser_id($"tmp"))

The result is:

Failed to execute user defined function(anonfun$4: (struct<_1:string,_2:int>) => string)

Would you like help me please?

Upvotes: 2

Views: 52

Answers (1)

Alper t. Turker
Alper t. Turker

Reputation: 35229

It should be Row not Tuple

import org.apache.spark.sql.Row

val getUser_id = udf( ( s: Row) => {
  s.getString(0)
})

or

val getUser_id = udf( ( s: Row) => {
 val Row(id: String, _) = s
 id
})

but here you should select:

newSession.withColumn( "user_id", getUser_id($"tmp._1"))

Upvotes: 1

Related Questions