Ott Toomet
Ott Toomet

Reputation: 1956

julia: how to read a bz2 compressed text file

In R, I can read a whole compressed text file into a character vector as

readLines("file.txt.bz2")

readLines transparently decompresses .gz and .bz2 files but also works with non-compressed files. Is there something analogous available in julia? I can do

text = open(f -> read(f, String), "file.txt")

but this cannot open compressed files. What is the preferred way to read bzip2 files? Is there any approach (besides manually checking the filename extension) that can deduce compression format automatically?

Upvotes: 5

Views: 741

Answers (1)

carstenbauer
carstenbauer

Reputation: 10127

I don't know about anything automatic but this is how you could (create and) read a bz2 compressed file:

using CodecBzip2 # after ] add CodecBzip2

# Creating a dummy bz2 file
mystring = "Hello StackOverflow!"
mystring_compressed = transcode(Bzip2Compressor, mystring)
write("testfile.bz2", mystring_compressed)

# Reading and uncompressing it
compressed = read("testfile.bz2")
plain = transcode(Bzip2Decompressor, compressed)
String(plain) # "Hello StackOverflow!"

There are also streaming variants available. For more see CodecBzip2.jl.

Upvotes: 5

Related Questions