Areza
Areza

Reputation: 6080

reading a huge csv file using cudf

I am trying to read a huge csv file CUDF but gets memory issues.

import cudf
cudf.set_allocator("managed")
cudf.__version__
user_wine_rate_df = cudf.read_csv('myfile.csv',
                                 sep = "\t",
                                 parse_dates = ['created_at'])


'0.17.0a+382.gbd321d1e93'

terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: cudaErrorIllegalAddress: an illegal memory access was encountered
Aborted (core dumped)

If I remove cudf.set_allocator("managed") I get

MemoryError: std::bad_alloc: CUDA error at: /opt/conda/envs/rapids/include/rmm/mr/device/cuda_memory_resource.hpp:69: cudaErrorMemoryAllocation out of memory

I am using CUDF through rapidsai/rapidsai:cuda11.0-runtime-ubuntu16.04-py3.8

I wonder whar could be the reason of hitting memory, while I can read this big file with pandas

**Update

I installed dask_cudf

and used dask_cudf.read_csv('myfile.csv') - but still get the

parallel_for failed: cudaErrorIllegalAddress: an illegal memory access was encountered

Upvotes: 1

Views: 3377

Answers (2)

TaureanDyerNV
TaureanDyerNV

Reputation: 1291

Check out this blog by Nick Becker on reading larger than GPU memory files. It should get you on your way.

Upvotes: 1

saloni
saloni

Reputation: 316

If the file you are reading is larger than the memory available then you will observe an OOM(Out Of Memory) error as cuDF runs on a sigle GPU. In order to read files which are very large I would recommend using dask_cudf.

Upvotes: 1

Related Questions