Reputation: 309
I am using Pandas and Python to import a CSV, and the data in the imported dataframe is manipulated so that a new column is made.
Each row in the new column is made based on the values in each corresponding row of both column A and column B. There are more columns with data in the dataframe, however these are irrelevant for code below.
The imported dataframe have several thousand rows.
Both column A and column B contain numerical values between and including 0 and 99.
import pandas as pd
import csv
df = pd.read_csv("import.csv", names=["Id", "Month", "Name", "ColA", "ColB" ])
def f(row):
if row['colA'].isin([10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]) and row['colB'].isin([30, 31, 32, 33, 34, 35, 57, 58]):
val = row['ColA']
elif row['ColB'].isin([10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]) and row['ColA'].isin([30, 31, 32, 33, 34, 35, 57, 58]):
val = row['ColB']
elif row['ColA'] > row['ColB']:
val = row['ColA']
elif row['ColA'] < row['ColB']:
val = row['ColB']
else:
val = row['ColA']
return val
df['NewColumnName'] = df.apply(f, axis=1)
df.to_csv("export.csv", encoding='utf-8')
Running the above code returns the error:
AttributeError: ("'float' object has no attribute 'isin'", 'occurred at index 0')
So obviously .isin() can't be used in that manner. Any suggestions to how this could be solved?
EDIT Adding a column where the same conditions apply using Jezrael's approach the code would look as follows I guess:
m1 = (df['colA'].isin(L1) & df['colB'].isin(L2)) | (df['ColA'] > df['ColB'])
m2 = (df['colB'].isin(L1) & df['colA'].isin(L2)) | (df['ColA'] < df['ColB'])
m3 = (df['colC'].isin(L1) & df['colB'].isin(L2)) | (df['ColC'] > df['ColB'])
m4 = (df['colB'].isin(L1) & df['colC'].isin(L2)) | (df['ColC'] < df['ColB'])
m5 = (df['colC'].isin(L1) & df['colA'].isin(L2)) | (df['ColC'] > df['ColA'])
m6 = (df['colA'].isin(L1) & df['colC'].isin(L2)) | (df['ColC'] < df['ColA'])
df['NewColumnName'] = np.select([m1, m2, m3, m4, m5, m6], [df['ColA'], df['ColB'], df['ColC'], df['ColA'], df['ColB'], df['ColC'],], default=df['ColA'])
Upvotes: 1
Views: 37608
Reputation: 862741
In pandas the best is avoid loops, so better is use numpy.select
and chain condition by &
for AND
and |
for OR
:
L1 = [10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]
L2 = [30, 31, 32, 33, 34, 35, 57, 58]
m1 = (df['colA'].isin(L1) & df['colB'].isin(L2)) | (df['ColA'] > df['ColB'])
m2 = (df['colB'].isin(L1) & df['colA'].isin(L2)) | (df['ColA'] < df['ColB'])
df['NewColumnName'] = np.select([m1, m2], [df['ColA'], df['ColB']], default=df['ColA'])
Upvotes: 4
Reputation: 199
You need to use it like this:
df[df['ColA'].isin([10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48])]
This will give you the rows where ColA
value is in the list that is indicated above. You are trying to do it per value, however this method applies on whole column. If you want to see if one value is in this list then you can write something like this in your function using numpy:
if np.any(row['colA'] == [10, 11, 12, 13, 14, 15, 20, 21, 22, 23, 24, 48]):
val = row['ColA']
Upvotes: 3