Lin
Lin

Reputation: 1202

Assignin lists as elements of CUDF DataFrame

While using Pandas, I can add lists as elements without issues, as in

import pandas as pd

A = {"cls": "A"}
B = {"cls": "B"}
C = {"cls": ["A", "B"]}

df = pd.DataFrame([A,B,C])
type(df.iloc[2]["cls"])   # Returns `list`

But cudf.DataFrame do not accept a List. As we can see here:

import cudf
cu_df = cudf.DataFrame([A, B, C])

Fails with ArrowTypeError: Expected bytes, got a 'list' object

We can see if we do not add C, it work.

import cudf
cu_df = cudf.DataFrame([A, B])

(no error)

Trying to convert from a regular pandas dataframe, also do not works

cu_df = cudf.DataFrame(df)

(fails with the same ArrowTypeError)

Any ideas in how to circumvent this?

Upvotes: 1

Views: 63

Answers (1)

mrconcerned
mrconcerned

Reputation: 1985

After reading some documentation and this GitHub issue, it says

list operations are somewhat limited, and a column of lists can't be treated the same as a column of ndarrays in Pandas.

Thus, you might try to convert the list into string:

A = {"cls": "A"}
B = {"cls": "B"}
C = {"cls": str(["A", "B"])}

and use it in cudf:

df = pd.DataFrame([A, B, C])
cu_df = cudf.DataFrame(df)

if that does not help, as mentioned on same issue:

explode each list column into a flat column, perform the binary operation, then construct a list column back from the result

Upvotes: 0

Related Questions