Reputation: 129
I have a below result of Dataset.
+------+---------+--------+
| Col1 | Col2 | NumCol |
+------+---------+--------+
| abc | jun2016 | 25 |
| aac | jun2017 | 28 |
| aac | dec2017 | 30 |
| aac | apr2018 | 45 |
+------+---------+--------+
when sorting is applied I get below result.
+------+---------+--------+
| Col1 | Col2 | NumCol |
+------+---------+--------+
| aac | apr2018 | 45 |
| aac | dec2017 | 30 |
| aac | jun2017 | 28 |
| abc | jun2018 | 25 |
+------+---------+--------+
But instead should have been
+------+---------+--------+
| Col1 | Col2 | NumCol |
+------+---------+--------+
| aac | jun2017 | 28 |
| aac | dec2017 | 30 |
| aac | apr2018 | 45 |
| abc | jun2018 | 25 |
+------+---------+--------+
According to chronological order. How will I be able to achieve the same.
When I have a column as Week as below
+------+-----------------------+--------+
| Col1 | Week | NumCol |
+------+-----------------------+--------+
| aac | 02/04/2018-02/10/2018 | 45 |
| aac | 02/11/2018-02/17/2018 | 25 |
| aac | 01/28/2018-02/03/2018 | 30 |
+------+-----------------------+--------+
I want that to get sorted as below.
+------+-----------------------+--------+
| Col1 | Week | NumCol |
+------+-----------------------+--------+
| aac | 01/28/2018-02/03/2018 | 30 |
| aac | 02/04/2018-02/10/2018 | 45 |
| aac | 02/11/2018-02/17/2018 | 25 |
+------+-----------------------+--------+
Here above I want to parse date of the column week as new Column dateweek , then sort the Week column and delete before returning the dataset.
Kind of challenging stuff for me.
for #1 I followed this But the issue with this is if there's suppose Jan2016,Feb2016,Jan2017 it gets sorted as Jan2016,Jan2017,Feb2016.
Need help for 2
Upvotes: 0
Views: 550
Reputation: 5881
split the week and sort based on date.
import org.apache.spark.sql.functions.split
df.withColumn("_tmp", split($"Week", "-")).select($"Col1", $"Week", $"NumCCol1", $"_tmp".getItem(0).as("_sort")).sort("_sort").drop("_sort").show()
output-
+----+---------------------+--------+
|Col1|Week |NumCCol1|
+----+---------------------+--------+
|aac |01/28/2018-02/03/2018|30 |
|aac |02/04/2018-02/10/2018|45 |
|aac |02/11/2018-02/17/2018|25 |
+----+---------------------+--------+
Upvotes: 1