Reputation: 657
I am new to Amazon AWS. I want to access the Google Books Ngrams dataset. The size is about 2.2 TB. Available at: s3://datasets.elasticmapreduce/ngrams/books/
Since the data is big, I cannot really download it to my computer. (1) How can I just examine part of the data? For example, download or examine online 10MB of the big file. (2) How can I create snapshot so that I can use Amazon EC2 to analyze the dat? In order to create a public data set volume from a snapshot, I need to find the snapshot ID for that data set. But I cannot not find it anywhere.
Upvotes: 0
Views: 1022
Reputation: 13501
(1) Yes, you can use AWS CLI or S3DistCP to copy part of the data. (2) That data is on S3, so you won't have a snapshot as you would on EBS data sets.
I'd recommend you take this lab to understand how to process this data set: https://run.qwiklab.com/focuses/preview/1161?locale=en
Upvotes: 1