Reputation: 70
I have the following spark dataframe
+------+--------------------+-----------------+--------------------+
|entity| instance| name| value|
+------+--------------------+-----------------+--------------------+
|Column| ARM_PRED| Histogram.abs. | 638065.0|
|Column| ARM_PRED|Histogram.ratio. | 1.0|
|Column| ATTR044_Count| Compliance| 0.9886814039322013|
|Column|ATTR090_MissingCount| Compliance| 0.9997570780406385|
|Column| ATTR045| Mean| 17101.922291929834|
|Column| ATTR045| Maximum| 4000000.0|
|Column| ATTR045| Minimum| 0.0|
|Column| ATTR045|StandardDeviation| 32981.67451994775|
|Column| ATTR020_Count| Compliance| 0.99814909139351|
|Column|ATTR036_MissingCount| Compliance|0.011260608245241473|
|Column| ATTR011_Count| Compliance| 0.99814909139351|
|Column| ATTR069| Mean| 0.8678189434810735|
|Column| ATTR069| Maximum| 103.0|
When name
column has Compliance value. I have to split instance
column based on last occurrence of delimiter "_" and place values in instance
column and name
column.
Expected output:
+------+--------------------+-----------------+--------------------+
|entity| instance| name| value|
+------+--------------------+-----------------+--------------------+
|Column| ARM_PRED| Histogram.abs. | 638065.0|
|Column| ARM_PRED|Histogram.ratio. | 1.0|
|Column| ATTR044| Count| 0.9886814039322013|
|Column| ATTR090| MissingCount| 0.9997570780406385|
|Column| ATTR045| Mean| 17101.922291929834|
|Column| ATTR045| Maximum| 4000000.0|
|Column| ATTR045| Minimum| 0.0|
|Column| ATTR045|StandardDeviation| 32981.67451994775|
|Column| ATTR020| Count| 0.99814909139351|
|Column| ATTR036| MissingCount|0.011260608245241473|
|Column| ATTR011| Count| 0.99814909139351|
|Column| ATTR069| Mean| 0.8678189434810735|
|Column| ATTR069| Maximum| 103.0|
Below code performs operation on all records, but i need to perform only when name
column has Compliance value
df.withColumn("instance", split($"instance", "_(?!.*_)"))
.withColumn("instance", $"instance"(0))
.withColumn("name", $"instance"(1))).show
Need help!
Upvotes: 0
Views: 462
Reputation: 66
You can use when
function.
Example:
df
.withColumn("_instance", split($"instance", "_(?!.*_)"))
.withColumn("instance", when($"name" === "Compliance", $"_instance"(0)).otherwise($"instance"))
.withColumn("name", when($"name" === "Compliance", $"_instance"(1)).otherwise($"name"))
.drop("_instance")
Upvotes: 1