duckworthd
duckworthd

Reputation: 15207

pandas: iterative filtering a DataFrame's rows

Suppose I have a DataFrame like so,

df = pd.DataFrame([['x', 1, 2], ['x', 1, 3], ['y', 2, 2]], 
                  columns=['a', 'b', 'c'])

To select all rows where c == 2 and a == 'x', I could do something like,

df[(df['a'] == 'x') & (df['c'] == 2)]

Or I could iterative refine by making temporary variables,

df1 = df[df['a'] == 'x']
df2 = df1[df1['c'] == 2]

Is there a way to iterative refine on rows?

(
  df
  .refine(lambda row: row['a'] == 'x')     # this method doesn't exist
  .refine(lambda row: row['c'] == 2)
)

Upvotes: 3

Views: 1680

Answers (2)

Noel Evans
Noel Evans

Reputation: 8516

If you have a number of terms; the number of which you don't know until runtime, you can do as below. I am not saying this is at all a beautiful way to achieve the goal but I can't see an alternative with Pandas 0.14.1:

df = pd.DataFrame([['x', 1, 2], ['x', 1, 3], ['y', 2, 2]],
                  columns=['a', 'b', 'c'])

conditions = {'a': 'x', 'c': 2}

def esc(term):
    if isinstance(term, str):
        return '"%s"' % term
    return str(term)

q_parts = ["%s == %s" % (k, esc(v)) for k, v in conditions.items()]
q = ' and '.join(q_parts)

print df.query(q)

Of course, the esc function or the wider snippet would need to be extended further to handle logical-NOT, is x in (x, y, z), etc...

Upvotes: 0

Phillip Cloud
Phillip Cloud

Reputation: 25662

While this isn't a solution for now, in pandas version 0.13 you'll be able to do

df.query('a == "x"').query('c == 2')

to achieve what you want.

You'll also be able to do

df['a == "x"']['c == 2']

and

df['a == "x" and c == 2']

What's wrong with

df[(df.a == 'x') & (df.c == 2)]

until 0.13?

Upvotes: 1

Related Questions