dhkim
dhkim

Reputation: 21

I'm trying to open a zstd file in Python

I'm trying to open a zstd file in Python. I downloaded the file from https://the-eye.eu/redarcs/ archive. This archive provides Reddit posts and comments. When I download data from this site and open it with pandas after selecting a subreddit, it opens successfully. The code I used to open the data is as follows:

import zstandard
import pandas as pd

path = 'local_path/file_name.zst'
data = pd.read_table(path, compression='zstd', header=None)

However, when I try to open files that contain bulk data from all of Reddit, the above code doesn't work. I obtained these bulk data files using torrents (the torrent files for bulk data were downloaded from the archive site). The error I encountered when trying to open the files is as follows:

ZstdError: zstd decompress error: Frame requires too much memory for decoding

This error occurs even when the file is very small.

A friend of mine suggested that the issue might be due to a difference in the zstd file version used for bulk data and the zstd version in my local environment for opening the files. I would like to know the exact cause of this issue.

Upvotes: 1

Views: 657

Answers (1)

SuccoDiMora
SuccoDiMora

Reputation: 69

Honestly I've never managed zstd files, but recently I've solved something similar chunking data instead decompressing entire data at once.

Of course chunk size needs to be not too small and not too big.

Upvotes: 0

Related Questions