Reputation: 41
I'm building data lake in S3. Hence, I would like to store the raw data stream into s3 and below is my code snippet, where I have tried with local storage.
val tweets = TwitterUtils.createStream(ssc, None)
val engtweets = tweets.filter(status => status.getLang() == "en").map(x => x.getText())
import sql.implicits._
engtweets.foreachRDD { rdd =>
val df = rdd.toDF()
df.write.format("json").save("../Ramesh")
}
I would like to store Raw data(entire JSON object) in s3.
Upvotes: 0
Views: 1261
Reputation: 1300
Just setup the access key and secret key in core-site.xml as follows:
<property>
<name>fs.s3a.access.key</name>
<value>...</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>...</value>
</property>
Once you have done this, you should be able to write into s3 using s3 protocol like : s3a:///
Hope this helps!
Upvotes: 1
Reputation: 162
You can simply use saveAsTextFile
method with path prefixed as
s3a://<file path>
required, your Amazon s3 is set-up correctly with or without credential.
https://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_s3.html
Upvotes: 0