fuzzy-logic
fuzzy-logic

Reputation: 385

Pandas applymap function deletes rows when applied to too many columns?

I have a dataframe in which I am looking to backfill all NaN values using the first upcoming row with a value. My code right now is this:

df[df.applymap(np.isfinite).all(1)]

When I reduce my dataframe to 7 columns or less, this works. However when I try to run this on a dataframe with more columns, I am returned an empty dataframe with just the column headers.

What is going on here? My dataframe has 800 rows.

Upvotes: 1

Views: 99

Answers (2)

jpp
jpp

Reputation: 164773

This is exactly what you should expect with an all condition. Consider this minimal example:

df = pd.DataFrame([[1, 2, np.inf],
                   [4, np.inf, 6]])

res1 = df[df.iloc[:, :2].applymap(np.isfinite).all(1)]  # test first 2 columns only
res2 = df[df.applymap(np.isfinite).all(1)]              # test all columns

print(len(res1.index))  # 1
print(len(res2.index))  # 0

df.iloc[:, :2] filters for the first 2 columns only and, clearly, the first row will be kept. While, in the second case, both rows are excluded since all columns are considered.

As an aside, np.isfinite(df).all(1) is more idiomatic in this case: you don't have to apply the operation for each value individually via applymap.

Upvotes: 1

Statistic Dean
Statistic Dean

Reputation: 5280

Let's take a look at your code here. df.applymap(np.isfinite).all(1) is a series with the same index as your original dataframe, and each element is either True or False, depending on if all the columns are finite or not. You use this series as a mask to filter your original dataframe. If the resulting dataframe is empty, It means that your series is full of false. In other terms, each row has at least one value that is not finite.

Upvotes: 0

Related Questions