Reputation: 443
I am importing data from a csv file where I have columns Reading1 and Reading2 and storing it into a pyspark dataframe. My objective is to have a new column name Reading and its value as a array containing values of Reading1 and Reading2. How can I achieve the same in pyspark.
+---+-----------+-----------+
| id| Reading A| Reading B|
+---+-----------------------+
|01 | 0.123 | 0.145 |
|02 | 0.546 | 0.756 |
+---+-----------+-----------+
Desired Output:
+---+------------------+
| id| Reading |
+---+------------------+
|01 | [0.123, 0.145] |
|02 | [0.546, 0.756 |
+---+------------------+-
Upvotes: 0
Views: 212
Reputation: 155
try this
import pyspark.sql.functions as f
df.withColumn('reading',f.array([f.col("reading a"), f.col("reading b")]))
Upvotes: 1