Reputation: 2253
Let's say I have a data frame:
myGraph=spark.createDataFrame([(1.3,2.1,3.0),
(2.5,4.6,3.1),
(6.5,7.2,10.0)],
['col1','col2','col3'])
I want to add a new string column so that it looks like:
from pyspark.sql.functions import lit
myGraph=myGraph.withColumn('rowName',lit('xxx'))
Until here, the values in rowName are all 'xxx'. But I do not know how to add a new column values ('col1','col2','col3') into the rowName?
Upvotes: 0
Views: 72
Reputation: 7336
You can create a random int value (1-N) using the build-in rand()
function and a udf helper function to generate the new string as next:
val randColumnUDF = udf((rand: Long) => s"X${rand}")
val N = 10000
df.withColumn("rand", randColumnUDF(rand() * N)).show(false)
+----+
|rand|
+----+
|X1 |
|X8 |
|X6 |
|... |
+----+
The code above will append a random number between 1 - 10000 to X producing values: X1, X23, ... etc
Upvotes: 1