Reputation: 1293
I'm trying to use rand() with a window function to create a random set of numbers per group using the below:
df.withColumn("random_groups", F.rand().over(Window.partitionBy("groups")))
This error is being raised however,
AnalysisException: Expression 'rand(4853692135296631772)' not supported within a window function.
Does anyone have any advice as to what I can do here to get my intended output? It looks like this
ID | groups | random_groups
1 | A | 0.3
2 | A | 0.9
3 | B | 0.8
Upvotes: 0
Views: 1009
Reputation: 541
Apparently, F.rand()
doesn't work with .over(some_window)
, but if you aren't doing anything different with the random function per group then it doesn't matter. Just add your random column and do whatever you want to do with the random number later with filters or groupBy.
df = df.withColumn('random_groups', F.rand())
df.groupBy('groups').agg(F.max('random_groups').alias('max_rand')).show()
If you want different random functions per group, you might need something like this:
df = df.withColumn(
'random_groups',
F.when(F.col('groups') == 'A', F.rand(seed=69))
.when(F.col('groups') == 'B', F.randn(seed=42))
.otherwise(F.lit(-1)) # leave off for null values in other groups
)
Upvotes: 1