How to get data from S3 and use them for Elastic map reduce/ where to write codes?

Question

I have two big files and have uploaded them into an Amazon S3 bucket named "ccssdd" and created a folder named data: data/friendships.xml data/users.xml

structure of users is

and

I need to write a job jar to run it on Amazon Elastic Map Reduce to compute: Find the number of friends for each user.

I know I should produce pairs from each friendship element as the output of map function and in the reduce function, I should sum the "1"s for each userid.

1_ I know that I can run my app in the eclipse to produce .jar job file but I don't know what libraries I should download and add to the project.

2- I really don't know how I can connect my application to s3! and get xml elements one by one and extract user id from them

Please kindly help me with that. I've found this tutorial which is very similar to my problem however when I copy it to eclipse I get error for almost every line, none of .org libraries are known and ... Also, I have no idea how I can access to data files which are on the S3 ...

How to get data from S3 and use them for Elastic map reduce/ where to write codes?

Answers (1)

Related Questions