Reputation: 159
I have a dataframe and a filter I want to apply to the frame in the form of a series. The filtered dataframe should include all rows that match the filter. Where the filter has a "wildcard-value", defined in this case as NaN, everything is considered a match.
Below is my implementation of such a filter:
df: pandas.DataFrame
f: pandas.Series
def match(row: pandas.Series, f: pandas.Series):
return all([isinstance(value, float) and math.isnan(value) or value == row[idx]
for idx, value in zip(f.index, f)])
filtered_df = df[[match(row, f) for _, row in df.iterrows()]]
It does the job, but it's not as elegant as I would like and might be to slow for large df
. I have heard that iterating over pandas frames is frowned upon and am therefore looking for a better solution.
How can one write this code in a better way?
Update with runnable code:
import math
import pandas
if __name__ == '__main__':
data = {'Name': ['Ankit', 'Amit', 'Aishwarya', 'Priyanka'],
'Age': [21, 19, 19, 19],
'Stream': ['Math', 'Commerce', 'Arts', 'Biology'],
'Percentage': [88, 88, 88, 70]}
df = pandas.DataFrame(data, columns=['Name', 'Age', 'Stream', 'Percentage'])
f = pandas.Series([math.nan, 19, math.nan, 88], index=['Name', 'Age', 'Stream', 'Percentage'])
def match(row: pandas.Series, f: pandas.Series):
return all([isinstance(value, float) and math.isnan(value) or value == row[idx]
for idx, value in zip(f.index, f)])
filtered_df = df[[match(row, f) for _, row in df.iterrows()]]
print(filtered_df)
Upvotes: 0
Views: 150
Reputation: 381
You could try to use an inner join to keep only the relevant rows, like this example:
# Remove indexes without condition
f = f.dropna()
# Move the series into a DataFrame (T needed to transpose)
f_frame = f.to_frame().T
# Perform inner join
filtered_df = df.merge(f_frame, how='inner', on=list(f_frame.columns))
Upvotes: 2