user3685285
user3685285

Reputation: 6616

Use SparkSQL when function to select columns

In the SparkSQL documentation, there is a when function that returns a column. The example given is reproduced below:

people.select(when(people("gender") === "male", 0)
   .when(people("gender") === "female", 1)
   .otherwise(2))

In this example, the result of the when condition is either a 0, 1, or 2. But what if I wanted the result to be a column of the people DataFrame? For example, given the following data:

id | name    | gender | testosterone | estrogen
-----------------------------------------------
 1 | Joe     |   male |           10 |        2
 2 | Sue     | female |            3 |       12
 3 | John    |   male |            9 |        3
 4 | Kim     | female |            1 |       10

I want something like this:

SELECT
    name,
    CASE WHEN gender = "male" THEN testosterone
         WHEN gender = "female" THEN estrogen
    END AS hormone_level
FROM
    people

And the result would be:

name    | hormone_level
-----------------------
Joe     |            10
Sue     |            12
John    |             9
Kim     |            10

Upvotes: 0

Views: 317

Answers (1)

user10240257
user10240257

Reputation: 31

Just

when(people("gender") === "female", people("estrogen"))
  .when(people("gender") === "male", people("testosterone"))
  // .otherwise(???) Add base-case if required

Upvotes: 3

Related Questions