Reputation: 421
I've a dataframe having different eventtype(10, 11, 12), data varies based on the eventtype, my requirement here is, when eventtype=11 I would need to concatenate two columns data(sensor, field1) to array[0,1] in the sensor column and shift the data to one left, i've given my desired output, is there a way I can achieve this using pyspark.
I was able to concatenate using spark.sql concat() function, but looking for a more generic code which would concatenate and shift the column vales to one left and assign the last column value as NULL.
eventtype sensor field1 field2 eventcategory
11 1 0 SE_500 500
Expected output:
eventtype, sensor, field1, field2, eventcategory
11 [1,0] SE_500 500 NULL
Upvotes: 1
Views: 615
Reputation: 42422
You can use when
to construct an appropriate selection. Note that a column can only have one type, so if the eventtype != 11, the sensor column has to be converted into an array of one element.
import pyspark.sql.functions as F
df2 = df.select(
'eventtype',
F.when(
F.col('eventtype') == 11,
F.array('sensor', 'field1')
).otherwise(F.array('sensor')).alias('sensor'),
*[
F.when(
F.col('eventtype') == 11,
F.col(df.columns[i+3]) if i+3 < len(df.columns) else F.lit(None)
).otherwise(F.col(c)).alias(c)
for (i, c) in enumerate(df.columns[2:])
]
)
df2.show()
+---------+------+------+------+-------------+
|eventtype|sensor|field1|field2|eventcategory|
+---------+------+------+------+-------------+
| 11|[1, 0]|SE_500| 500| null|
| 10| [1]| 0|SE_500| 500|
+---------+------+------+------+-------------+
Upvotes: 1