Choix
Choix

Reputation: 575

How do I read sequence data in Scala in Spark

This is my first time to attempt to read sequence format data in Scala, it would be greatly appreciated if someone can help me with the right command.

data:

hdfs dfs -cat orders03132_seq/part-m-00000 | head
SEQ!org.apache.hadoop.io.LongWritableordeG�Y���&���]E�@��

My command:

sc.sequenceFile("orders03132_seq/part-m-00000", classOf[Int], classOf[String]).first

Error:

18/03/13 16:59:28 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) java.lang.RuntimeException: java.io.IOException: WritableName can't load class: orders at org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:2103)

Thank you very much in advance.

Upvotes: 2

Views: 2266

Answers (1)

suj1th
suj1th

Reputation: 1801

You would need to read it as a Hadoop File. You can do this with something like:

sc.hadoopFile[K, V, SequenceFileInputFormat[K,V]]("path/to/file")

Refer documentation here.

Upvotes: 1

Related Questions