Reputation: 298
LINK TO data.csv In scala the code gives an Array of string but in python I wanted the same output just like scala : Code in Scala:
val spark = SparkSession.builder()
.appName("Test_Parquet")
.master("local[*]")
.getOrCreate()
val sc = spark.sparkContext
val parquetDF = spark.read.csv("data.csv")
parquetDF.coalesce(1).write.mode("overwrite").parquet("Parquet")
val rdd = spark.read.parquet("Parquet").rdd
val header = rdd.first()
val rdd1 = rdd.filter(_ != header).map(x => x.toString)
rdd1.foreach(println)
OUTPUT:
[Canada,47;97;33;94;6] [Canada,59;98;24;83;3] [Canada,77;63;93;86;62] [China,86;71;72;23;27] [China,74;69;72;93;7] [China,58;99;90;93;41] [England,40;13;85;75;90] [England,39;13;33;29;14] [England,99;88;57;69;49] [Germany,67;93;90;57;3] [Germany,0;9;15;20;19] [Germany,77;64;46;95;48] [India,90;49;91;14;70] [India,70;83;38;27;16] [India,86;21;19;59;4]
Code in Python:
spark = SparkSession.builder.appName("Test_Parquet").master("local[*]").getOrCreate()
parquetDF = spark.read.csv("data.csv")
parquetDF.coalesce(1).write.mode("overwrite").parquet("Parquet")
rdd = spark.read.parquet("Parquet").rdd
header = rdd.first()
print(header)
rdd1 = rdd.filter(lambda line: header != line).map(lambda x: str(x))
rdd1.foreach(print)
The output of the python is different than the scala were i'm doing the same thing in python
Upvotes: 0
Views: 31
Reputation: 7399
I think rdd1.foreach(print)
should work, but since you're converting from a DataFrame
you will get Row
objects instead.
I think the following should work:
rdd1.map(list).foreach(print)
Difference:
df.rdd.foreach(print)
# Row(Name='John', gender='Male', state='GA')
# Row(Name='Mary', gender='Female', state='GA')
# Row(Name='Alex', gender='Male', state='NY')
# Row(Name='Ana', gender='Female', state='NY')
# Row(Name='Amy', gender='Female', state='NY')
df.rdd.map(list).foreach(print)
# ['John', 'Male', 'GA']
# ['Mary', 'Female', 'GA']
# ['Alex', 'Male', 'NY']
# ['Ana', 'Female', 'NY']
# ['Amy', 'Female', 'NY']
Note: If this is not your exact problem, then please provide the actual and expected output
Upvotes: 2