StephanieCGraduate
StephanieCGraduate

Reputation: 53

transpose pyspark df and get back a pyspark df

I'm trying to transpose an object type data frame and want to get back a data frame object (after it's been transposed):

enter image description here

and transpose into this:

enter image description here

I need the transposed object to remain a spark data frame.

Thank you!

Upvotes: 2

Views: 71

Answers (1)

kites
kites

Reputation: 1405

check this out. you can use groupby and pivot. Please note i renamed name column because it was ambiguous to the dataframe once the name values are pivoted

    df.show()

    # +------------+-----+
    # |        name|value|
    # +------------+-----+
    # |        Name|  str|
    # |lastActivity| date|
    # |          id|  str|
    # +------------+-----+

    df1 = df.withColumnRenamed("name", "name_val").groupBy("name_val").pivot("name_val").agg(F.first("value"))

    df1.show()

    # +------------+----+----+------------+
    # |    name_val|Name|  id|lastActivity|
    # +------------+----+----+------------+
    # |        Name| str|null|        null|
    # |          id|null| str|        null|
    # |lastActivity|null|null|        date|
    # +------------+----+----+------------+

    df1.select(*[F.first(column,ignorenulls=True).alias(column) for column in df1.columns if column not in 'name_val']).show()


    # 
    # +----+---+------------+
    # |Name| id|lastActivity|
    # +----+---+------------+
    # | str|str|        date|
    # +----+---+------------+

Upvotes: 3

Related Questions