Stevven
Stevven

Reputation: 31

How to oversample a dataframe in Pyspark?

How to oversample a dataframe in pyspark?

df.sample(fractions, seed)

Which only sample a fraction of the df, it can't oversample.

Upvotes: 3

Views: 4559

Answers (1)

Tshilidzi Mudau
Tshilidzi Mudau

Reputation: 7899

You could over-sample by making use of the sample method as follows:

df.sample(withReplacement=True, total_percent_of_upsample, seed)

sample(withReplacement, fraction, seed=None)

The True indicates that you want to sample with replacement.

Upvotes: 1

Related Questions