Reputation: 3591
I have a pandas table df:
so the df is:
Item | Category | Price
SKU123 | CatA | 4.5
SKU124 | CatB | 4.7
SKU124 | CatB | 4.7
SKU125 | CatA | NaN
SKU126 | CatB | NaN
SKU127 | CatC | 4.5
here is a generator
df = pd.DataFrame({'sku': ('SKU123', 'SKU124', 'SKU124', 'SKU125', 'SKU126', 'SKU127'), 'Cat':('CatA', 'CatB', 'CatB', 'CatA', 'CatB', 'CatC'), 'Price':(4.5, 4.7, 4.7, '', '', 4.5)})
I am trying to drop anything with NaN.
So I entered
filtered_df = df.drop_duplicates
filtered_df['Price'].dropna(inplace=True)
I get this error:
TypeError: 'instancemethod' object has no attribute '__getitem__'
The result I want is:
Item | Category | Price
SKU123 | CatA | 4.5
SKU124 | CatB | 4.7
SKU127 | CatC | 4.5
Upvotes: 0
Views: 892
Reputation: 90889
The basic issue with your code is in the line -
filtered_df = df.drop_duplicates
DataFrame.drop_duplicates
is a method, you need to call it.
Also, another issue is that filtered_df['Price'].dropna(inplace=True)
would not do what you want it to do, since even if the values are dropped from the series, since the index exists in the dataframe, it would again come up with NaN
value in Series.
You can instead do boolean indexing based on the non null values of filtered_df['Price']
series. Example -
filtered_df = df.drop_duplicates()
filtered_df = filtered_df[filtered_df['Price'].notnull()]
But please note, in the example you gave to create the dataframe, the values are empty strings - ''
- instead of NaN
. If you control how you create the DataFrame, you should consider using None
instead of ''
.
But if the empty string comes from somwhere else, you can use Series.convert_objects
method to convert them to NaN
while indexing. Example -
filtered_df = filtered_df[filtered_df['Price'].convert_objects(convert_numeric=True).notnull()]
Demo -
In [42]: df = pd.DataFrame({'sku': ('SKU123', 'SKU124', 'SKU124', 'SKU125', 'SKU126', 'SKU127'), 'Cat':('CatA', 'CatB', 'CatB', 'CatA', 'CatB', 'CatC'), 'Price':(4.5, 4.7, 4.7, '', '', 4.5)})
In [43]: filtered_df = df.drop_duplicates()
In [44]: filtered_df = filtered_df[filtered_df['Price'].convert_objects(convert_numeric=True).notnull()]
In [45]: filtered_df
Out[45]:
Cat Price sku
0 CatA 4.5 SKU123
1 CatB 4.7 SKU124
5 CatC 4.5 SKU127
Upvotes: 2