Dask: AttributeError: 'DataFrame' object has no attribute '_getitem_array'

I've got some data on S3 bucket that I want to work with.

I've imported it using:

import boto3
import dask.dataframe as dd

def import_df(key):
        s3 = boto3.client('s3')
        df = dd.read_csv('s3://.../' + key ,encoding='latin1')
        return df

key = 'Churn/CLEANED_data/file.csv'
train = import_df(key)

I can see that the data has been imported correctly using:

train.head()

but when I try simple operation (taken from this dask doc):

train_churn = train[train['CON_CHURN_DECLARATION'] == 1]
train_churn.compute()

I've got Error:

AttributeError Traceback (most recent call last) in ()

1 train_churn = train[train['CON_CHURN_DECLARATION'] == 1]

----> 2 train_churn.compute()

~/anaconda3/envs/python3/lib/python3.6/site-packages/dask/base.py in compute(self, **kwargs) 152 dask.base.compute 153 """ --> 154 (result,) = compute(self, traverse=False, **kwargs) 155 return result 156

AttributeError: 'DataFrame' object has no attribute '_getitem_array'

Full error here: Error Upload

Upvotes: 2

Answers (4)

Kaushal Sharma

Reputation: 2280

I had the same issue with dask ( version 2.14.0). Reinstalling dask solved my problem. I believe there must be some problem with the previously installed version.

Upvotes: 0

gcamargo

Reputation: 3961

You potentially have a old version of dask. Installing version 2.13.0 fixed this issue for me.

Upvotes: 0

Ben Saunders

Reputation: 1069

I was facing a similar issue when trying to read from s3 files, ultimately solved by updating dask to most recent version (I think the one sagemaker instances start with by default is deprecated)

Install/Upgrade packages and dependencies (from notebook)

! python -m pip install --upgrade dask
! python -m pip install fsspec
! python -m pip install --upgrade s3fs

Hope this helps!

Upvotes: 1

FeatCrush

Reputation: 23

If it's a row-wise selection on 'CON_CHURN_DECLARATION' you should be able to filter the dataframe with :

train_churn = train[train.CON_CHURN_DECLARATION==1]

Upvotes: 0

Dask: AttributeError: &#39;DataFrame&#39; object has no attribute &#39;_getitem_array&#39;

Answers (4)

Install/Upgrade packages and dependencies (from notebook)

Related Questions

Dask: AttributeError: 'DataFrame' object has no attribute '_getitem_array'