Pandas: Using .isin() returns the error: "AttributeError: float' object has no attribute 'isin'"

I am using Pandas and Python to import a CSV, and the data in the imported dataframe is manipulated so that a new column is made.

Each row in the new column is made based on the values in each corresponding row of both column A and column B. There are more columns with data in the dataframe, however these are irrelevant for code below.

The imported dataframe have several thousand rows.

Both column A and column B contain numerical values between and including 0 and 99.

import pandas as pd

import csv

df = pd.read_csv("import.csv", names=["Id", "Month", "Name", "ColA", "ColB" ])

def f(row):
    if row['colA'].isin([10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]) and  row['colB'].isin([30, 31, 32, 33, 34, 35, 57, 58]):
        val = row['ColA']
    elif row['ColB'].isin([10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]) and  row['ColA'].isin([30, 31, 32, 33, 34, 35, 57, 58]):
        val = row['ColB']
    elif row['ColA'] > row['ColB']:
        val = row['ColA']
    elif row['ColA'] < row['ColB']:
        val = row['ColB']
    else: 
        val = row['ColA']
    return val            

df['NewColumnName'] = df.apply(f, axis=1)   

df.to_csv("export.csv", encoding='utf-8')

Running the above code returns the error:

AttributeError: ("'float' object has no attribute 'isin'", 'occurred at index 0')

So obviously .isin() can't be used in that manner. Any suggestions to how this could be solved?

EDIT Adding a column where the same conditions apply using Jezrael's approach the code would look as follows I guess:

m1 = (df['colA'].isin(L1) & df['colB'].isin(L2)) | (df['ColA'] > df['ColB'])
m2 = (df['colB'].isin(L1) & df['colA'].isin(L2)) | (df['ColA'] < df['ColB'])
m3 = (df['colC'].isin(L1) & df['colB'].isin(L2)) | (df['ColC'] > df['ColB'])
m4 = (df['colB'].isin(L1) & df['colC'].isin(L2)) | (df['ColC'] < df['ColB'])
m5 = (df['colC'].isin(L1) & df['colA'].isin(L2)) | (df['ColC'] > df['ColA'])
m6 = (df['colA'].isin(L1) & df['colC'].isin(L2)) | (df['ColC'] < df['ColA'])



df['NewColumnName'] = np.select([m1, m2, m3, m4, m5, m6], [df['ColA'], df['ColB'], df['ColC'], df['ColA'], df['ColB'], df['ColC'],], default=df['ColA'])

Upvotes: 1

Views: 37608

Answers (2)

jezrael
jezrael

Reputation: 862741

In pandas the best is avoid loops, so better is use numpy.select and chain condition by & for AND and | for OR:

L1 = [10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]
L2 = [30, 31, 32, 33, 34, 35, 57, 58]

m1 = (df['colA'].isin(L1) & df['colB'].isin(L2)) | (df['ColA'] > df['ColB'])
m2 = (df['colB'].isin(L1) & df['colA'].isin(L2)) | (df['ColA'] < df['ColB'])

df['NewColumnName'] = np.select([m1, m2], [df['ColA'], df['ColB']], default=df['ColA'])

Upvotes: 4

relay
relay

Reputation: 199

You need to use it like this:

df[df['ColA'].isin([10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48])]

This will give you the rows where ColA value is in the list that is indicated above. You are trying to do it per value, however this method applies on whole column. If you want to see if one value is in this list then you can write something like this in your function using numpy:

if np.any(row['colA'] == [10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]):
   val = row['ColA']

Upvotes: 3

Related Questions