Why is Python Twarc2 freezing on a large file?

Question

I am trying to run Python Twarc hydrate on a very large file of 2,339,076 records but it keeps freezing. I have tried the script on a smaller data set and it works fine. My question is, does Twarc have a maximum number of rows it can process? If so what is it? Do I need to separate my data in to smaller subsections?

I have tried the terminal command:

twarc2 hydrate 2020-03-22_clean-dataset_csv.csv > hydrated.jsonl

I have tried it on a smaller file and it works fine

I have tried searching to find whether the is a limit to the number of rows Twarc can process but I can't find an answer.

Why is Python Twarc2 freezing on a large file?

Answers (1)

Related Questions