Reputation: 2617
I have a Series like this after doing groupby('name') and used mean() function on other column
name
383 3.000000
663 1.000000
726 1.000000
737 9.000000
833 8.166667
Could anyone please show me how to filter out the rows with 1.000000 mean values? Thank you and I greatly appreciate your help.
Upvotes: 171
Views: 249544
Reputation: 21074
In my case I had a pandas Series where the values are tuples of characters:
Out[67]
0 (H, H, H, H)
1 (H, H, H, T)
2 (H, H, T, H)
3 (H, H, T, T)
4 (H, T, H, H)
Therefore I could use indexing to filter the series, but to create the index I needed apply
. My condition is "find all tuples which have exactly one 'H'".
series_of_tuples[series_of_tuples.apply(lambda x: x.count('H')==1)]
I admit it is not "chainable", (i.e. notice I repeat series_of_tuples
twice; you must store any temporary series into a variable so you can call apply(...) on it).
There may also be other methods (besides .apply(...)
) which can operate elementwise to produce a Boolean index.
Many other answers (including accepted answer) using the chainable functions like:
.compress()
.where()
.loc[]
[]
These accept callables (lambdas) which are applied to the Series, not to the individual values in those series!
Therefore my Series of tuples behaved strangely when I tried to use my above condition / callable / lambda, with any of the chainable functions, like .loc[]
:
series_of_tuples.loc[lambda x: x.count('H')==1]
Produces the error:
KeyError: 'Level H must be same as name (None)'
I was very confused, but it seems to be using the Series.count series_of_tuples.count(...)
function , which is not what I wanted.
I admit that an alternative data structure may be better:
This creates a series of strings (i.e. by concatenating the tuple; joining the characters in the tuple on a single string)
series_of_tuples.apply(''.join)
So I can then use the chainable Series.str.count
series_of_tuples.apply(''.join).str.count('H')==1
Upvotes: 2
Reputation: 2821
From pandas version 0.18+ filtering a series can also be done as below
test = {
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
}
pd.Series(test).where(lambda x : x!=1).dropna()
Checkout: http://pandas.pydata.org/pandas-docs/version/0.18.1/whatsnew.html#method-chaininng-improvements
Upvotes: 113
Reputation: 4602
As DACW pointed out, there are method-chaining improvements in pandas 0.18.1 that do what you are looking for very nicely.
Rather than using .where
, you can pass your function to either the .loc
indexer or the Series indexer []
and avoid the call to .dropna
:
test = pd.Series({
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
})
test.loc[lambda x : x!=1]
test[lambda x: x!=1]
Similar behavior is supported on the DataFrame and NDFrame classes.
Upvotes: 88
Reputation: 294258
A fast way of doing this is to reconstruct using numpy
to slice the underlying arrays. See timings below.
mask = s.values != 1
pd.Series(s.values[mask], s.index[mask])
0
383 3.000000
737 9.000000
833 8.166667
dtype: float64
naive timing
Upvotes: 34
Reputation: 214957
If you like a chained operation, you can also use compress
function:
test = pd.Series({
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
})
test.compress(lambda x: x != 1)
# 383 3.000000
# 737 9.000000
# 833 8.166667
# dtype: float64
Upvotes: 4
Reputation: 3871
In [5]:
import pandas as pd
test = {
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
}
s = pd.Series(test)
s = s[s != 1]
s
Out[0]:
383 3.000000
737 9.000000
833 8.166667
dtype: float64
Upvotes: 204
Reputation: 22832
Another way is to first convert to a DataFrame and use the query method (assuming you have numexpr installed):
import pandas as pd
test = {
383: 3.000000,
663: 1.000000,
726: 1.000000,
737: 9.000000,
833: 8.166667
}
s = pd.Series(test)
s.to_frame(name='x').query("x != 1")
Upvotes: 7