Reputation: 2947
I have a pipeline which performs analysis on a table and adds extra features to classify that row of data. In this toy case I have table with features [id, x, y, z]
and I'm adding has_adj
. I can't figure how to determine the logical truth value of N columns (ie. the number of columns in the adjustment hunt could be N):
id x y z n has_adj_0 has_adj_1 has_adj_n
0 AX1 10.0 Adj <NA> .. True False ...
1 V0D 3.5 <NA> <NA> .. False False ...
2 G7L 8.0 <NA> Adj .. False True ...
Finally, I set the feature df['has_adj'] = True
where the row contains any True
values, else False
.
Here is the toy example to produce the above table:
import pandas as pd
import re
def hf_txn_has_adj(text, regex_dict):
if pd.isna(text):
return False
rx = re.compile(regex_dict['regex_value'])
result = rx.match(text)
if rx.match(text):
return True
else:
return False
regex_dict = {'regex_value': '(Adj)'}
df = pd.DataFrame([['AX1', 10, 'Adj', pd.NA],
['V0D', 3.5, pd.NA, pd.NA],
['G7L', 8, pd.NA, 'Adj']],
columns=['id', 'x', 'y', 'z'])
for i, adj_feat in enumerate(['y', 'z']):
df['has_adj_' + str(i)] = df[adj_feat].apply(hf_txn_has_adj, regex_dict=regex_dict)
Upvotes: 3
Views: 58
Reputation: 35676
df['has_adj'] = df.filter(like='has_adj_').any(axis=1)
print(df)
df
:
id x y z has_adj_0 has_adj_1 has_adj
0 AX1 10.0 Adj <NA> True False True
1 V0D 3.5 <NA> <NA> False False False
2 G7L 8.0 <NA> Adj False True True
Upvotes: 4