Create sample weight in PySpark sampled dataframe

Question

I have created a dataframe in PySpark as follows:

df = spark.range(10)

The dataframe looks like this:

df.show()

+---+
| id|
+---+
|  0|
|  1|
|  2|
|  3|
|  4|
|  5|
|  6|
|  7|
|  8|
|  9|
+---+

I have then taken random sample as follows:

df1 = df.sample(fraction=0.5, seed=123)

The sampled dataframe looks like this:

df1.show()

+---+
| id|
+---+
|  0|
|  2|
|  3|
|  5|
|  6|
|  7|
+---+

I need to create a field called "weight" in the sampled dataframe (df1). I know how to do it in Pandas, but I do not know how to do it in PySpark. Can anyone help me please?

Create sample weight in PySpark sampled dataframe

Answers (1)

Related Questions