Reputation: 1
I have a task that involves running many queries on a dataframe. I compared the performance of running these queries on a Xeon CPU (Pandas) vs. RTX 2080 (CUDF). For a dataframe of 100k rows, GPU is faster but not by much. Looking at nvidia-smi output and the GPU utilization is around 3-4% while running the queries.
My question is what can I do to speed up the cuDF task and achieve high GPU utilization?
For example I can run 8 of these queries on 8 CPU cores in parallel for the CPU use case.
NUM_ELEMENTS = 100000
df = cudf.DataFrame()
df['value1'] = cp.random.sample(NUM_ELEMENTS)
df['value2'] = cp.random.sample(NUM_ELEMENTS)
df['value3'] = cp.random.sample(NUM_ELEMENTS)
c1 = np.random.random()
c2 = np.random.random()
c3 = np.random.random()
res = df.query('((value1 < @c1) & (value2 > @c2) & (value3 < @c3))')
Here is a sample code that doesn't take a lot of GPU cycles, however I want to run thousands of such queries on the data and I don't want to run them sequentially. Is there a way to run the multiple query() calls on a cuDF dataframe in parallel to maximize GPU utilization?
Upvotes: 0
Views: 577
Reputation: 251
We're working towards enabling this in cudf, but this is currently a limitation of the cuDF library. The parallelism mechanism you're looking for is using CUDA Streams (https://developer.nvidia.com/blog/gpu-pro-tip-cuda-7-streams-simplify-concurrency/). We don't quite yet support CUDA streams in the cuDF Python library, but we're actively working on it.
You may be able to workaround this using a combination of cupy and numba along with their support of CUDA streams (https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.Stream.html, https://numba.pydata.org/numba-doc/dev/cuda-reference/host.html#stream-management), but you'd be in a very experimental area.
Upvotes: 1