Reputation: 31
How to oversample a dataframe in pyspark?
df.sample(fractions, seed)
Which only sample a fraction of the df, it can't oversample.
Upvotes: 3
Views: 4559
Reputation: 7899
You could over-sample by making use of the sample method as follows:
df.sample(withReplacement=True, total_percent_of_upsample, seed)
sample(withReplacement, fraction, seed=None)
The True
indicates that you want to sample with replacement.
Upvotes: 1