Hemant
Hemant

Reputation: 197

How to generate large no of records in spark

How to generate large(million) no of records with multiple fields in spark.I don't read data from file but the data will be random generated data.From that data I want to create RDD.

Upvotes: 2

Views: 1667

Answers (1)

Pawan B
Pawan B

Reputation: 4623

You can take reference from Random data generation provided by the spark.

RandomRDDs provides factory methods to generate random double RDDs or vector RDDs.

import org.apache.spark.SparkContext
import org.apache.spark.mllib.random.RandomRDDs._


val sc: SparkContext = ...

// Generate a random double RDD that contains 1 million i.i.d. values drawn from the
// standard normal distribution `N(0, 1)`, evenly distributed in 10 partitions.
val u = normalRDD(sc, 1000000L, 10)
// Apply a transform to get a random double RDD following `N(1, 4)`.
val v = u.map(x => 1.0 + 2.0 * x)

Upvotes: 3

Related Questions