Deleting Specific Rows from Pandas Dataframe

Question

I have a very large dataframe that I am trying to clean up. Here is a sample of the dataframe.

[in]: dftrain

[out]:
    date    store_nbr   item_nbr    units       tavg    preciptotal
2012-01-01       1          1          0         42         0.05
2012-01-01       1          2          2         42         0.05
2012-01-01       1          3          0         42         0.05 
2012-01-01       1          4          0         42         0.05 

2012-01-01       2          1          0         62         0.15
2012-01-01       2          2          0         62         0.15
2012-01-01       2          3          20        62         0.15
2012-01-01       2          4          1         62         0.15

2012-01-01       3          1          0         72         1.05
2012-01-01       3          2          0         72         1.05
2012-01-01       3          3          29        72         1.05
2012-01-01       3          3          0         72         1.05
...(New Day) 
  date      store_nbr   item_nbr     units      tavg    preciptotal
2012-01-02       1          1          0         47         0.10
2012-01-02       1          2          0         47         0.10
2012-01-02       1          3          0         47         0.10 
2012-01-02       1          4          0         47         0.10 

2012-01-02       2          1          0         60         0.0
2012-01-02       2          2          0         60         0.0
2012-01-02       2          3          5         60         0.0
2012-01-02       2          4          0         60         0.0

2012-01-02       3          1          0         67         .23
2012-01-02       3          2          0         67         .23
2012-01-02       3          3          2         67         .23
2012-01-02       3          3          0         67         .23
...

I have a dictionary with the str_nbr as the key and item_nbr as the values of the store and item numbers I want to keep. Here is a sample dictionary: {'1': [2], '2': [3,4], '3': [3]} What I want to do is go store by store and delete all the rows with the item_nbr NOT contained as the values of my dictionary. So in this example when I have dropped the undesired rows I would have this...

 [in]: dfrain_drop

[out]:
    date    store_nbr   item_nbr    units       tavg    preciptotal

2012-01-01       1          2          2         42         0.05

2012-01-01       2          3          20        62         0.15
2012-01-01       2          4          1         62         0.15

2012-01-01       3          3          29        72         1.05

...(New Day) 
  date      store_nbr   item_nbr     units      tavg    preciptotal

2012-01-02       1          2          0         47         0.10

2012-01-02       2          3          5         60         0.0
2012-01-02       2          4          0         60         0.0

2012-01-02       3          3          0         67         .23

Evan Wright · Accepted Answer

This is the fastest solution I could come up with:

is_good = lambda group: group.isin(keep[group.name])
result = df[df.groupby('store_nbr')['item_nbr'].apply(is_good)]

It splits up the DataFrame by store_nbr, calls is_good on each each group (apply) to determine the rows you want to keep, puts everything back together in the right order, and then takes a subset of rows from the original frame.

Deleting Specific Rows from Pandas Dataframe

Answers (2)

Related Questions