Reputation: 1029
I'm trying to convert an rdd to dataframe with out any schema. I tried below code. It's working fine, but the dataframe columns are getting shuffled.
def f(x):
d = {}
for i in range(len(x)):
d[str(i)] = x[i]
return d
rdd = sc.textFile("test")
df = rdd.map(lambda x:x.split(",")).map(lambda x :Row(**f(x))).toDF()
df.show()
Upvotes: 2
Views: 8443
Reputation: 28322
If you don't want to specify a schema, do not convert use Row
in the RDD. If you simply have a normal RDD
(not an RDD[Row]
) you can use toDF()
directly.
df = rdd.map(lambda x: x.split(",")).toDF()
You can give names to the columns using toDF()
as well,
df = rdd.map(lambda x: x.split(",")).toDF("col1_name", ..., "colN_name")
If what you have is an RDD[Row]
you need to actually know the type of each column. This can be done by specifying a schema or as follows
val df = rdd.map({
case Row(val1: String, ..., valN: Long) => (val1, ..., valN)
}).toDF("col1_name", ..., "colN_name")
Upvotes: 4