I want to transform below source dataframe (using pyspark): Key ID segment 1 A m1 2 A m1 3 B m1 4 C m2 1 D m1 2 E m1 3 F m1 4 G m2 1 J m1 2 J m1 3 J m1 4 K m2 Into below result dataframe: ID key1 key2 A 1 2 B 3 - C 4 - D 1 - F 3 - G 4 - J 1 2 J 1 3 J 2 3 K 4 - In other words: I want to highlight the "pairs" in the dataframe - If I have more than one key for the same ID, I would like to point each relation in diferents lines. Thank you for your help

Reputation: 21

Spark dataframe - transform rows with same ID to columns

I want to transform below source dataframe (using pyspark):

Key	ID	segment
1	A	m1
2	A	m1
3	B	m1
4	C	m2
1	D	m1
2	E	m1
3	F	m1
4	G	m2
1	J	m1
2	J	m1
3	J	m1
4	K	m2

Into below result dataframe:

ID	key1	key2
A	1	2
B	3	-
C	4	-
D	1	-
F	3	-
G	4	-
J	1	2
J	1	3
J	2	3
K	4	-

In other words: I want to highlight the "pairs" in the dataframe - If I have more than one key for the same ID, I would like to point each relation in diferents lines.

Thank you for your help

Upvotes: 1

Answers (1)

wwnde

Reputation: 26676

Use window functions. I assume - means a one man group. If not you can use when/otherwise contion to blank the 1s out.

w =Window.partitionBy('ID').orderBy(desc('Key'))

df= (df.withColumn('key2', lag('segment').over(w))# create new column with value of preceding segment for each row
     .withColumn('key2', col('key2').isNotNull())# query to create boolean selection
     .withColumn('key2',F.sum(F.col('key2').cast('integer')).over(w.rowsBetween(Window.currentRow, sys.maxsize))+1)#Create cumulative groups
     .orderBy('ID', 'key')#Reorder frame
    )

df.show()

+---+---+-------+----+
|Key| ID|segment|key2|
+---+---+-------+----+
|  1|  A|     m1|   2|
|  2|  A|     m1|   2|
|  3|  B|     m1|   1|
|  4|  C|     m2|   1|
|  1|  D|     m1|   1|
|  2|  E|     m1|   1|
|  3|  F|     m1|   1|
|  4|  G|     m2|   1|
|  1|  J|     m1|   2|
|  2|  J|     m1|   3|
|  3|  J|     m1|   3|
|  4|  K|     m2|   1|
+---+---+-------+----+

Upvotes: 1

Spark dataframe - transform rows with same ID to columns

Answers (1)

Related Questions