MemoryError: Unable to allocate 43.5 GiB for an array with shape (5844379795,) and data type int64

I have a large dataframe and I am trying to update one column :

Dataframe:

enter image description here

I would like to update the last column IsFraudsterStatus.

My Code:

df= pd.concat(chunk_list,)
    def expand_fraud(no_fraud, fraud, col_name):
        t = pd.merge(no_fraud, fraud, on=col_name)
        if len(t):
            df.loc[df.ID.isin(t.ID_x), "IsFraudsterStatus"] = 1
            return True
        return False
    while True:
        added_fraud = False
        fraud = df[df.IsFraudsterStatus == 1]
        no_fraud = df[df.IsFraudsterStatus == 0]
        added_fraud |= expand_fraud(no_fraud, fraud, "DeviceId")
        added_fraud |= expand_fraud(no_fraud, fraud, "Email")
        added_fraud |= expand_fraud(no_fraud, fraud, "MobileNo")
        if not added_fraud:
            break

Error:

enter image description here

Upvotes: 0

Views: 3250

Answers (1)

Using Dask has solved all the "Memory Error" problems.

Upvotes: 1

Related Questions