Reputation: 1
I want to have my table in order id,poperty_name,time and value.
I tred to shuffle all the combinations.
my_row = parsed1.map(lambda x: {
"id": (str(x[3]) + ":" + str(x[0]) + ":" +str(x[1])),
"property_name": x[4],
"time" : x[1],
"value": x[2],
})
Im keep having order time,id,property_name,value, and I can not see why. Those x[0,...4]
represents just fields from my JSON object that I'm parsing. And it's all working, all extraction is ok, but order is not. And I need exact orded, because this needs to be written in Cassandra DB.
Upvotes: 0
Views: 45
Reputation: 130
A select
statement will only keep certain columns in your dataframe, and drop all others. You're looking for the sort
statement.
Naturally, I'm assuming that you'll also want to specify the way that a particular dimension is sorted. I've included an additional parameter desc
so that you can see how (within an sorting statement) you can adjust the way this is performed.
Here is an example:
from pyspark.sql.functions import desc
my_row = parsed1.map(lambda x: {
"id": (str(x[3]) + ":" + str(x[0]) + ":" +str(x[1])),
"property_name": x[4],
"time" : x[1],
"value": x[2],
})
sorted_my_row = my_row \
.sort(col("id"), \
col("property_name"), \
col("time").desc(), \
col("value"))
Upvotes: 0
Reputation: 671
Just use a select with the order you want:
sorted_df = df.select("id", "poperty_name", "time", "value")
Upvotes: 1