Reputation: 2617

Pandas How to filter a Series

I have a Series like this after doing groupby('name') and used mean() function on other column

name
383      3.000000
663      1.000000
726      1.000000
737      9.000000
833      8.166667

Could anyone please show me how to filter out the rows with 1.000000 mean values? Thank you and I greatly appreciate your help.

Upvotes: 171

Answers (7)

Nate Anderson

Reputation: 21074

In my case I had a pandas Series where the values are tuples of characters:

Out[67]
0    (H, H, H, H)
1    (H, H, H, T)
2    (H, H, T, H)
3    (H, H, T, T)
4    (H, T, H, H)

Therefore I could use indexing to filter the series, but to create the index I needed apply. My condition is "find all tuples which have exactly one 'H'".

series_of_tuples[series_of_tuples.apply(lambda x: x.count('H')==1)]

I admit it is not "chainable", (i.e. notice I repeat series_of_tuples twice; you must store any temporary series into a variable so you can call apply(...) on it).

There may also be other methods (besides .apply(...)) which can operate elementwise to produce a Boolean index.

Many other answers (including accepted answer) using the chainable functions like:

.compress()
.where()
.loc[]
[]

These accept callables (lambdas) which are applied to the Series, not to the individual values in those series!

Therefore my Series of tuples behaved strangely when I tried to use my above condition / callable / lambda, with any of the chainable functions, like .loc[]:

series_of_tuples.loc[lambda x: x.count('H')==1]

Produces the error:

KeyError: 'Level H must be same as name (None)'

I was very confused, but it seems to be using the Series.count series_of_tuples.count(...) function , which is not what I wanted.

I admit that an alternative data structure may be better:

A Category datatype?
A Dataframe (each element of the tuple becomes a column)
A Series of strings (just concatenate the tuples together):

This creates a series of strings (i.e. by concatenating the tuple; joining the characters in the tuple on a single string)

series_of_tuples.apply(''.join)

So I can then use the chainable Series.str.count

series_of_tuples.apply(''.join).str.count('H')==1

Upvotes: 2

DACW

Reputation: 2821

From pandas version 0.18+ filtering a series can also be done as below

test = {
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
}

pd.Series(test).where(lambda x : x!=1).dropna()

Checkout: http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#method-chaininng-improvements

Upvotes: 113

Gordon Bean

Reputation: 4602

As DACW pointed out, there are method-chaining improvements in pandas 0.18.1 that do what you are looking for very nicely.

Rather than using .where, you can pass your function to either the .loc indexer or the Series indexer [] and avoid the call to .dropna:

test = pd.Series({
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
})

test.loc[lambda x : x!=1]

test[lambda x: x!=1]

Similar behavior is supported on the DataFrame and NDFrame classes.

Upvotes: 88

piRSquared

Reputation: 294258

A fast way of doing this is to reconstruct using numpy to slice the underlying arrays. See timings below.

mask = s.values != 1
pd.Series(s.values[mask], s.index[mask])

0
383    3.000000
737    9.000000
833    8.166667
dtype: float64

naive timing

Upvotes: 34

akuiper

Reputation: 214957

If you like a chained operation, you can also use compress function:

test = pd.Series({
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
})

test.compress(lambda x: x != 1)

# 383    3.000000
# 737    9.000000
# 833    8.166667
# dtype: float64

Upvotes: 4

Andrew

Reputation: 3871

In [5]:

import pandas as pd

test = {
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
}

s = pd.Series(test)
s = s[s != 1]
s
Out[0]:
383    3.000000
737    9.000000
833    8.166667
dtype: float64

Upvotes: 204

Kamil Sindi

Reputation: 22832

Another way is to first convert to a DataFrame and use the query method (assuming you have numexpr installed):

import pandas as pd

test = {
383:    3.000000,
663:    1.000000,
726:    1.000000,
737:    9.000000,
833:    8.166667
}

s = pd.Series(test)
s.to_frame(name='x').query("x != 1")

Upvotes: 7

Pandas How to filter a Series

Answers (7)

Related Questions