Reputation: 1042
I've got some data on S3 bucket that I want to work with.
I've imported it using:
import boto3
import dask.dataframe as dd
def import_df(key):
s3 = boto3.client('s3')
df = dd.read_csv('s3://.../' + key ,encoding='latin1')
return df
key = 'Churn/CLEANED_data/file.csv'
train = import_df(key)
I can see that the data has been imported correctly using:
train.head()
but when I try simple operation (taken from this dask doc):
train_churn = train[train['CON_CHURN_DECLARATION'] == 1]
train_churn.compute()
I've got Error:
AttributeError Traceback (most recent call last) in ()
1 train_churn = train[train['CON_CHURN_DECLARATION'] == 1]
----> 2 train_churn.compute()
~/anaconda3/envs/python3/lib/python3.6/site-packages/dask/base.py in compute(self, **kwargs) 152 dask.base.compute 153 """ --> 154 (result,) = compute(self, traverse=False, **kwargs) 155 return result 156
AttributeError: 'DataFrame' object has no attribute '_getitem_array'
Full error here: Error Upload
Upvotes: 2
Views: 4031
Reputation: 2280
I had the same issue with dask ( version 2.14.0). Reinstalling dask solved my problem. I believe there must be some problem with the previously installed version.
Upvotes: 0
Reputation: 3961
You potentially have a old version of dask. Installing version 2.13.0 fixed this issue for me.
Upvotes: 0
Reputation: 1069
I was facing a similar issue when trying to read from s3 files, ultimately solved by updating dask to most recent version (I think the one sagemaker instances start with by default is deprecated)
! python -m pip install --upgrade dask
! python -m pip install fsspec
! python -m pip install --upgrade s3fs
Hope this helps!
Upvotes: 1
Reputation: 23
If it's a row-wise selection on 'CON_CHURN_DECLARATION' you should be able to filter the dataframe with :
train_churn = train[train.CON_CHURN_DECLARATION==1]
Upvotes: 0