Reputation: 229
I am trying to make a function where I feed my dataframe into - the purpose of the function is to categorize account postings into either "accept" or "ignore.
The problem I then have is that on some accounts I need to only look for a partial part of a text string. If I do that without a function it works, but in a function I get an error.
So this works fine:
ekstrakt.query("Account== 'Car_sales'").Tekst.str.contains("Til|Fra", na=False)
But this doesn't:
def cleansing(df):
if df['Account'] == 'Car_sales':
if df.Tekst.str.contains("Til|Fra", na=False) : return 'Ignore'
ekstrakt['Ignore'] = ekstrakt.apply(cleansing, axis = 1)
It results in an error: "AttributeError: 'str' object has no attribute 'str'"
I need the "cleansing" function to take more arguments afterwards, but I am struggling getting past this first part.
Upvotes: 1
Views: 948
Reputation: 863731
If use function processing each row separately, so cannot use pandas functon working with columns like str.contains
.
Possible solution is create new column by chained mask by &
for bitwise AND
with numpy.where
:
df = pd.DataFrame({'Account':['car','Car_sales','Car_sales','Car_sales'],
'Tekst':['Til','Franz','Text','Tilled']})
m1 = df['Account'] == 'Car_sales'
m2 = df.Tekst.str.contains("Til|Fra", na=False)
df['new'] = np.where(m1 & m2, 'Ignore', 'Accept')
print (df)
Account Tekst new
0 car Til Accept
1 Car_sales Franz Ignore
2 Car_sales Text Accept
3 Car_sales Tilled Ignore
If need processing in function, you can use in
statement with or
, because working with scalars:
def cleansing(x):
if x['Account'] == 'Car_sales':
if pd.notna(x.Tekst):
if ('Til' in x.Tekst) or ('Fra' in x.Tekst):
return 'Ignore'
df['Ignore'] = df.apply(cleansing, axis = 1)
print (df)
Account Tekst new Ignore
0 car Til Accept None
1 Car_sales Franz Ignore Ignore
2 Car_sales Text Accept None
3 Car_sales Tilled Ignore Ignore
Upvotes: 1