Reputation: 59
I have a date column in dataframe that that looks like this:
"JAN20, FEB20, MAR20 .... JAN21, FEB21, MAR21..."
This created a problem when I tried to plot numbers by this timestamp, as these are technically strings (not sortable), not timestamp or integer.
I guess one way is to convert "JAN20" into "20_1" so that it first sort by year, then sort by month, but then it loses the readability / interpretability as "JAN20".
Alternatively, is there a way for me to specify that "JAN, FEB, MAR, APR, MAY, JUN ..."
is the right string order?
Would appreciate any input into how do I transform this column, so that it could be properly sorted, and properly show on a time series plot.
Much appreciation!
P.S This is in PySpark.
Upvotes: 0
Views: 158
Reputation: 6082
Just append an index before each of column, like this 01Jan20, 02Feb20, ... 10Oct20, ...
. Don't forget the leading zeros, you might need more than one depends on the number of columns you have.
Upvotes: 1