DPatrick
DPatrick

Reputation: 59

PySpark: How to transform data from string to data (or integer) in an easy-to-read manner

I have a date column in dataframe that that looks like this:

"JAN20, FEB20, MAR20 .... JAN21, FEB21, MAR21..."

This created a problem when I tried to plot numbers by this timestamp, as these are technically strings (not sortable), not timestamp or integer.

I guess one way is to convert "JAN20" into "20_1" so that it first sort by year, then sort by month, but then it loses the readability / interpretability as "JAN20".

Alternatively, is there a way for me to specify that "JAN, FEB, MAR, APR, MAY, JUN ..." is the right string order?

Would appreciate any input into how do I transform this column, so that it could be properly sorted, and properly show on a time series plot.

Much appreciation!

P.S This is in PySpark.

Upvotes: 0

Views: 158

Answers (1)

pltc
pltc

Reputation: 6082

Just append an index before each of column, like this 01Jan20, 02Feb20, ... 10Oct20, .... Don't forget the leading zeros, you might need more than one depends on the number of columns you have.

Upvotes: 1

Related Questions