KillerSnail
KillerSnail

Reputation: 3591

pandas dropna on series

I have a pandas table df:

so the df is:

Item    | Category | Price
SKU123  | CatA     | 4.5
SKU124  | CatB     | 4.7
SKU124  | CatB     | 4.7
SKU125  | CatA     | NaN
SKU126  | CatB     | NaN
SKU127  | CatC     | 4.5

here is a generator

df = pd.DataFrame({'sku': ('SKU123', 'SKU124', 'SKU124', 'SKU125', 'SKU126', 'SKU127'), 'Cat':('CatA', 'CatB', 'CatB', 'CatA', 'CatB', 'CatC'), 'Price':(4.5, 4.7, 4.7, '', '', 4.5)})

I am trying to drop anything with NaN.

So I entered

filtered_df = df.drop_duplicates
filtered_df['Price'].dropna(inplace=True)

I get this error:

TypeError: 'instancemethod' object has no attribute '__getitem__'

The result I want is:

Item    | Category | Price
SKU123  | CatA     | 4.5
SKU124  | CatB     | 4.7
SKU127  | CatC     | 4.5

Upvotes: 0

Views: 892

Answers (1)

Anand S Kumar
Anand S Kumar

Reputation: 90889

The basic issue with your code is in the line -

filtered_df = df.drop_duplicates

DataFrame.drop_duplicates is a method, you need to call it.

Also, another issue is that filtered_df['Price'].dropna(inplace=True) would not do what you want it to do, since even if the values are dropped from the series, since the index exists in the dataframe, it would again come up with NaN value in Series.

You can instead do boolean indexing based on the non null values of filtered_df['Price'] series. Example -

filtered_df = df.drop_duplicates()
filtered_df = filtered_df[filtered_df['Price'].notnull()]

But please note, in the example you gave to create the dataframe, the values are empty strings - '' - instead of NaN . If you control how you create the DataFrame, you should consider using None instead of ''.

But if the empty string comes from somwhere else, you can use Series.convert_objects method to convert them to NaN while indexing. Example -

filtered_df = filtered_df[filtered_df['Price'].convert_objects(convert_numeric=True).notnull()]

Demo -

In [42]: df = pd.DataFrame({'sku': ('SKU123', 'SKU124', 'SKU124', 'SKU125', 'SKU126', 'SKU127'), 'Cat':('CatA', 'CatB', 'CatB', 'CatA', 'CatB', 'CatC'), 'Price':(4.5, 4.7, 4.7, '', '', 4.5)})

In [43]: filtered_df = df.drop_duplicates()

In [44]: filtered_df = filtered_df[filtered_df['Price'].convert_objects(convert_numeric=True).notnull()]

In [45]: filtered_df
Out[45]:
    Cat Price     sku
0  CatA   4.5  SKU123
1  CatB   4.7  SKU124
5  CatC   4.5  SKU127

Upvotes: 2

Related Questions