Reputation: 1395
I am looking to use pandas
to drop rows based on the column name (contains a space) and the cell value. I have tried various ways to achieve this (drop and query methods) but it seems I'm failing due to the space in the name. Is there a way to query the data using the name that has a space in it or do I need to clean all spaces first?
data in form of a csv file
Date,"price","Sale Item"
2012-06-11,1600.20,item1
2012-06-12,1610.02,item2
2012-06-13,1618.07,item3
2012-06-14,1624.40,item4
2012-06-15,1626.15,item5
2012-06-16,1626.15,item6
2012-06-17,1626.15,item7
Attempt Examples
df.drop(['Sale Item'] != 'Item1')
df.drop('Sale Item' != 'Item1')
df.drop("'Sale Item'] != 'Item1'")
df.query('Sale Item' != 'Item1')
df.query(['Sale Item'] != 'Item1')
df.query("'Sale Item'] != 'Item1'")
Error received in most cases
ImportError: 'numexpr' not found. Cannot use engine='numexpr' for query/eval if 'numexpr' is not installed
Upvotes: 7
Views: 12460
Reputation: 90979
As you can see from the documentation -
DataFrame.drop(labels, axis=0, level=None, inplace=False, errors='raise')
Return new object with labels in requested axis removed
DataFrame.drop()
takes the index
of the rows to drop, not the condition. Hence you would most probably need something like -
df.drop(df.ix[df['Sale Item'] != 'item1'].index)
Please note, this drops the rows that meet the condition, so the result would be the rows that don't meet the condition, if you want the opposite you can use ~
operator before your condition to negate it.
But this seems a bit too much, it would be easier to just use Boolean indexing to get the rows you want (as indicated in the other answer) .
Demo -
In [20]: df
Out[20]:
Date price Sale Item
0 2012-06-11 1600.20 item1
1 2012-06-12 1610.02 item2
2 2012-06-13 1618.07 item3
3 2012-06-14 1624.40 item4
4 2012-06-15 1626.15 item5
5 2012-06-16 1626.15 item6
6 2012-06-17 1626.15 item7
In [21]: df.drop(df.ix[df['Sale Item'] != 'item1'].index)
Out[21]:
Date price Sale Item
0 2012-06-11 1600.2 item1
Upvotes: 4
Reputation: 21574
If I understood correctly your issue, maybe you can just apply a filter like:
df = df[df['Sale Item'] != 'item1']
which returns:
Date price Sale Item
1 2012-06-12 1610.02 item2
2 2012-06-13 1618.07 item3
3 2012-06-14 1624.40 item4
4 2012-06-15 1626.15 item5
5 2012-06-16 1626.15 item6
6 2012-06-17 1626.15 item7
Upvotes: 8