Reputation: 99
I need to load and unzip a 27 giga dataset directly in my azure account to work on it with a spark instance with the textFile function to do some machine learning. How can I do it?
I would like to write more, but I have spent so many hours surfing on the internet and still I am not able to achieve anything useful.
This is the dataset:
https://academicgraphwe.blob.core.windows.net/graph-2016-02-05/index.html
Upvotes: 1
Views: 678
Reputation: 4062
If directly means from that location to your VM, then the most simple way, in my opinion, is to use AzCopy.
For example, in your case it can be like that: AzCopy /Source:https://academicgraphwe.blob.core.windows.net/graph-2016-02-05/ /Dest:C:\myfolder /SourceKey:key /Pattern:"abc.txt"
Install AzCopy on your VM and run the command. You need no SourceKey here as it looks like your dataset is in publicly available blob. But change your link to the needed location (because it is going to some kind of list of links).
Upvotes: 1