Saikat
Saikat

Reputation: 443

Storing values of multiples columns in pyspark dataframe under a new column

I am importing data from a csv file where I have columns Reading1 and Reading2 and storing it into a pyspark dataframe. My objective is to have a new column name Reading and its value as a array containing values of Reading1 and Reading2. How can I achieve the same in pyspark.

        +---+-----------+-----------+
        | id|  Reading A|  Reading B| 
        +---+-----------------------+
        |01 |  0.123    |   0.145   | 
        |02 |  0.546    |   0.756   |
        +---+-----------+-----------+

        Desired Output:
        +---+------------------+
        | id|    Reading       |
        +---+------------------+
        |01 |  [0.123, 0.145]  |
        |02 |  [0.546, 0.756   |
        +---+------------------+-

Upvotes: 0

Views: 212

Answers (1)

kranthi kumar
kranthi kumar

Reputation: 155

try this

import pyspark.sql.functions as f

df.withColumn('reading',f.array([f.col("reading a"), f.col("reading b")]))

Upvotes: 1

Related Questions