pySpark - Find common values in grouped data

Question

I am trying to find common values among the groups created by applying groupBy and pivot on a dataframe in pySpark. For example, the data looks like:

+--------+---------+---------+
|PlayerID|PitcherID|ThrowHand|
+--------+---------+---------+
|10000598| 10000104|        R|
|10000908| 10000104|        R|
|10000489| 10000104|        R|
|10000734| 10000104|        R|
|10006568| 10000104|        R|
|10000125| 10000895|        L|
|10000133| 10000895|        L|
|10006354| 10000895|        L|
|10000127| 10000895|        L|
|10000121| 10000895|        L|

After applying:

df.groupBy('PlayerID').pivot('ThrowHand').agg(F.count('ThrowHand')).drop('null').show(10)

I get something like:-

+--------+----+---+
|PlayerID| L  |  R|
+--------+----+---+
|10000591|  11| 43|
|10000172|  22|101|
|10000989|  05| 19|
|10000454|  05| 17|
|10000723|  11| 33|
|10001989|  11| 38|
|10005243|  20| 60|
|10003366|  11| 26|
|10006058|  02| 09|
+--------+----+---+

is there someway I can get common values of 'PitcherID' among the count of L and R in the above.

What i mean is for PlayerID =10000591, I have 11 PitcherID where ThrowHand is L and 43 PitcherID where ThrowHand is 43. It is possible that some Pitchers are common in this 11 and 43 Pitchers grouped.

Is there any way I can get these common PitcherID?

pySpark - Find common values in grouped data

Answers (1)

Related Questions