Reputation: 7951
I am coding a Spark application in Java, and I wonder how would I create DataFrame and/or JavaRDD from literal values.
For example I have 3 integers, say (784512, 35, 40)
corresponding to fields / columns (id, m_count, f_count)
.
Upvotes: 0
Views: 2158
Reputation: 14661
You want SparkContext.parallelize(...)
to create a JavaRDD and SQLContext.createDataFrame(...)
to create a dataframe.
JavaRDD rdd = sc.parallelize(Arrays.asList(1, 2, 3, 4));
If you are after creating a parallel list of objects with three values then you want:
@Test
public void test() {
JavaSparkContext sc = ...
SQLContext sqlContext = new SQLContext(sc);
JavaRDD<Counter> counters = sc.parallelize(Arrays.asList(new Counter(784512, 35, 40)));
DataFrame countersDF = sqlContext.createDataFrame(counters, Counter.class);
System.out.println(counters.collect());
System.out.println(countersDF.collectAsList());
}
public static class Counter implements Serializable{
private final int id;
private final int m_count;
private final int f_count;
Counter(int id, int m_count, int f_count) {
this.id = id;
this.m_count = m_count;
this.f_count = f_count;
}
public String toString() {
return id + " " + m_count + " " + f_count;
}
// getters
}
Upvotes: 2