Reputation: 47
I am trying to load a dataset directly from kaggle (without having to download it on my local machine first). For this, I referred different solutions on stackoverflow to come up with the following code:
library(httr)
dataset <- httr::GET("https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction/download/",
httr::authenticate("kguliani", authkey, type = "basic"))
# destination file
temp <- tempfile()
download.file(dataset$url,temp)
data <- read.csv(unz(temp, "heart.csv"))
unlink(temp)
head(data)
I believe it should work but I keep getting the error message:
Error in download.file(dataset$url, temp): cannot open URL 'https://www.kaggle.com/datasets/fedesoriano/heart-failure-prediction/download'
This URL works just fine in the web and lets me download the archive.zip
(that contains the target file 'heart.csv'
) file. Can someone pls guide me to why the url is not working?
(I edited the question to take out my authkey since I wasn't sure if I should share it). To reproduce, feel free to use your username and authkey instead. Thank you!
Upvotes: 1
Views: 197
Reputation: 314
After researching on Kaggle API, I found that the base url that they are using are a bit different than the actual link for retrieving the zip file. The format is as follow,
https://www.kaggle.com/api/v1/datasets/download/{owner_slug}/{dataset_slug}
I also added the argument mode="wb"
to the download.file
function as it was corrupting without it as stated here.
library(httr)
dataset <- httr::GET("https://www.kaggle.com/api/v1/datasets/download/fedesoriano/heart-failure-prediction",
httr::authenticate("kguliani", authkey, type = "basic"))
# destination file
temp <- tempfile(fileext = ".zip")
download.file(dataset$url, temp, mode = "wb")
data <- read.csv(unz(temp, "heart.csv"))
unlink(temp)
head(data)
Result:
Age Sex ChestPainType RestingBP Cholesterol FastingBS RestingECG MaxHR ExerciseAngina Oldpeak
1 40 M ATA 140 289 0 Normal 172 N 0.0
2 49 F NAP 160 180 0 Normal 156 N 1.0
3 37 M ATA 130 283 0 ST 98 N 0.0
4 48 F ASY 138 214 0 Normal 108 Y 1.5
5 54 M NAP 150 195 0 Normal 122 N 0.0
6 39 M NAP 120 339 0 Normal 170 N 0.0
ST_Slope HeartDisease
1 Up 0
2 Flat 1
3 Up 0
4 Flat 1
5 Up 0
6 Up 0
Upvotes: 1