ohmygoddess
ohmygoddess

Reputation: 657

How to access and mount Amazon public dataset to EC2

I am new to Amazon AWS. I want to access the Google Books Ngrams dataset. The size is about 2.2 TB. Available at: s3://datasets.elasticmapreduce/ngrams/books/

Since the data is big, I cannot really download it to my computer. (1) How can I just examine part of the data? For example, download or examine online 10MB of the big file. (2) How can I create snapshot so that I can use Amazon EC2 to analyze the dat? In order to create a public data set volume from a snapshot, I need to find the snapshot ID for that data set. But I cannot not find it anywhere.

Upvotes: 0

Views: 1022

Answers (1)

Julio Faerman
Julio Faerman

Reputation: 13501

(1) Yes, you can use AWS CLI or S3DistCP to copy part of the data. (2) That data is on S3, so you won't have a snapshot as you would on EBS data sets.

I'd recommend you take this lab to understand how to process this data set: https://run.qwiklab.com/focuses/preview/1161?locale=en

Upvotes: 1

Related Questions