Marko Milosavljevic
Marko Milosavljevic

Reputation: 1

How to sort table fields in my pyspark code

I want to have my table in order id,poperty_name,time and value.

I tred to shuffle all the combinations.

my_row =  parsed1.map(lambda x: {

    "id": (str(x[3]) + ":" + str(x[0]) + ":" +str(x[1])),
    "property_name": x[4],
    "time" : x[1],
    "value": x[2],
})

Im keep having order time,id,property_name,value, and I can not see why. Those x[0,...4] represents just fields from my JSON object that I'm parsing. And it's all working, all extraction is ok, but order is not. And I need exact orded, because this needs to be written in Cassandra DB.

Upvotes: 0

Views: 45

Answers (2)

shadow_dev
shadow_dev

Reputation: 130

A select statement will only keep certain columns in your dataframe, and drop all others. You're looking for the sort statement.

Naturally, I'm assuming that you'll also want to specify the way that a particular dimension is sorted. I've included an additional parameter desc so that you can see how (within an sorting statement) you can adjust the way this is performed.

Here is an example:

from pyspark.sql.functions import desc
my_row =  parsed1.map(lambda x: {

    "id": (str(x[3]) + ":" + str(x[0]) + ":" +str(x[1])),
    "property_name": x[4],
    "time" : x[1],
    "value": x[2],
})

sorted_my_row = my_row \
    .sort(col("id"), \
          col("property_name"), \
          col("time").desc(), \
          col("value"))

Upvotes: 0

Sequinex
Sequinex

Reputation: 671

Just use a select with the order you want:

sorted_df = df.select("id", "poperty_name", "time", "value")

Upvotes: 1

Related Questions