JoeM05
JoeM05

Reputation: 922

Filter data frame with a boolean vector based on one column

I'm somewhat new to pandas. I have a data frame, words_df with two columns. The first column ([[0]]) is a list of words, the second are values that I want to use in processes further downstream. Looks something like this:

                   a:State-word  occurrences
0                      FIRE         1535
1                       BRR         1189
2                     GREEN          521
3                    ORANGE          504
4                    PURPLE          503
5                      BLUE          482
6                    VIOLET          480
7                    YELLOW          445
8                    INDIGO          434
9                     BLACK          392
10                    WHITE          381
11                     PINK          322
...

I have a second list of words that I want to filter against. If a word is in my filter_list, remove the row in my words_df.

So far I've managed: filter_list = words_df[[0]].isin(filter_list)

which I then attempt to do this with: words_df[~filter_list]

It sort of works, but mostly doesn't. looks like this on the other end:

               a:State-word  occurrences
0                       NaN          NaN
1                       NaN          NaN
2                       NaN          NaN
3                       NaN          NaN
4                    PURPLE          NaN
5                       NaN          NaN
6                       NaN          NaN
7                    YELLOW          NaN
8                    INDIGO          NaN
9                       NaN          NaN
10                      NaN          NaN
11                      NaN          NaN

I'd like it to look like this:

               a:State-word  occurrences
1                    PURPLE          503
2                    YELLOW          445
3                    INDIGO          434

What am I doing wrong?

Upvotes: 1

Views: 591

Answers (1)

Romain
Romain

Reputation: 21878

You were close to the answer

# Test data
df = DataFrame({'a:State-word': ['FIRE','BRR', 'GREEN', 'ORANGE', 'PURPLE', 'BLUE', 'VIOLET'],
                'occurrences': [1535, 1189, 521, 504, 503, 482, 480]})
filter_list = ['FIRE', 'BRR', 'GREEN', 'ORANGE', 'BLUE']
df

#   a:State-word  occurrences
# 0         FIRE         1535
# 1          BRR         1189
# 2        GREEN          521
# 3       ORANGE          504
# 4       PURPLE          503
# 5         BLUE          482
# 6       VIOLET          480

df[~df['a:State-word'].isin(filter_list)]

#   a:State-word  occurrences
# 4       PURPLE          503
# 6       VIOLET          480

Upvotes: 2

Related Questions