Reputation: 173
Hi everyone,
I'm trying to read a zipped ".txt" file from a https web site with fread()
function, but i'm getting and error.
I also tried to read the zip file after download it, but i got the same error. Any ideas how to solve it?
fileUrl <- "https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2Fhousehold_power_consumption.zip"
dt <- fread(fileUrl)
Error in fread(fileUrl) :
Internal error: invalid head position. jump=1, headPos=0000020B75510005, thisJumpStart=0000020B7560C040, sof=0000020B75510000
### tried read locally after download too:
dt <- fread("Dataset.zip")
But i got the same error message.
### unzipped, the file is read without error:
dt <- fread("household_power_consumption.txt")
str(dt)
Classes ‘data.table’ and 'data.frame': 2075259 obs. of 9 variables:
$ Date : chr "16/12/2006" "16/12/2006" "16/12/2006" "16/12/2006" ...
$ Time : chr "17:24:00" "17:25:00" "17:26:00" "17:27:00" ...
$ Global_active_power : chr "4.216" "5.360" "5.374" "5.388" ...
$ Global_reactive_power: chr "0.418" "0.436" "0.498" "0.502" ...
$ Voltage : chr "234.840" "233.630" "233.290" "233.740" ...
$ Global_intensity : chr "18.400" "23.000" "23.000" "23.000" ...
$ Sub_metering_1 : chr "0.000" "0.000" "0.000" "0.000" ...
$ Sub_metering_2 : chr "1.000" "1.000" "2.000" "1.000" ...
$ Sub_metering_3 : num 17 16 17 17 17 17 17 17 17 16 ...
- attr(*, ".internal.selfref")=<externalptr>
Upvotes: 1
Views: 1402
Reputation: 421
Just a brief update: you can use shell commands in fread
to extract the files, like this:
url = "https://d396qusza40orc.cloudfront.net/exdata%2Fdata%2Fhousehold_power_consumption.zip"
download.file(url, dest = "./household_power_c.zip", mode = "wb")
dt <- data.table::fread(cmd = "unzip -cq ./household_power_c.zip")
Output:
> str(dt)
Classes ‘data.table’ and 'data.frame': 2075259 obs. of 9 variables:
$ Date : chr "16/12/2006" "16/12/2006" "16/12/2006" "16/12/2006" ...
$ Time : chr "17:24:00" "17:25:00" "17:26:00" "17:27:00" ...
$ Global_active_power : chr "4.216" "5.360" "5.374" "5.388" ...
$ Global_reactive_power: chr "0.418" "0.436" "0.498" "0.502" ...
$ Voltage : chr "234.840" "233.630" "233.290" "233.740" ...
$ Global_intensity : chr "18.400" "23.000" "23.000" "23.000" ...
$ Sub_metering_1 : chr "0.000" "0.000" "0.000" "0.000" ...
$ Sub_metering_2 : chr "1.000" "1.000" "2.000" "1.000" ...
$ Sub_metering_3 : num 17 16 17 17 17 17 17 17 17 16 ...
- attr(*, ".internal.selfref")=<externalptr>
>
Using shell commands is quite handy, you can explore all the options in the unzip
command (see $ man unzip
) for instance, extract just one file:
url <- "http://www.bls.gov/cex/pumd/data/comma/diary14.zip"
download.file(url, dest = "dataset.zip", mode="wb")
shc = 'unzip -cq dataset.zip diary14/expd141.csv' # shell command to extract one file of many files within the zip directory
zd <- data.table::fread(cmd = shc))
See this link for more information about using command-line tools in fread
:
Upvotes: 0
Reputation: 34773
fread
does not automatically read .zip
files, but you can unzip them cross-platform from within R:
tmp_dir = tempdir()
tmp = tempfile(tmpdir = tmp_dir)
download.file(fileUrl, tmp)
outf = unzip(tmp, list = TRUE)$Name
unzip(tmp, outf, exdir = tmp_dir)
fread(file.path(tmp_dir, outf))[1:10]
Date Time Global_active_power Global_reactive_power Voltage
1: 16/12/2006 17:24:00 4.216 0.418 234.840
2: 16/12/2006 17:25:00 5.360 0.436 233.630
3: 16/12/2006 17:26:00 5.374 0.498 233.290
4: 16/12/2006 17:27:00 5.388 0.502 233.740
5: 16/12/2006 17:28:00 3.666 0.528 235.680
6: 16/12/2006 17:29:00 3.520 0.522 235.020
7: 16/12/2006 17:30:00 3.702 0.520 235.090
8: 16/12/2006 17:31:00 3.700 0.520 235.220
9: 16/12/2006 17:32:00 3.668 0.510 233.990
10: 16/12/2006 17:33:00 3.662 0.510 233.860
Global_intensity Sub_metering_1 Sub_metering_2 Sub_metering_3
1: 18.400 0.000 1.000 17
2: 23.000 0.000 1.000 16
3: 23.000 0.000 2.000 17
4: 23.000 0.000 1.000 17
5: 15.800 0.000 1.000 17
6: 15.000 0.000 2.000 17
7: 15.800 0.000 1.000 17
8: 15.800 0.000 1.000 17
9: 15.800 0.000 1.000 17
10: 15.800 0.000 2.000 16
Upvotes: 3