carl.hiass
carl.hiass

Reputation: 1774

Understanding bracket filter syntax in pandas

How does the following filter out the results in pandas ? For example, with this statement:

df[['name', 'id', 'group']][df.id.notnull()]

I get 426 rows (it filters out everything where df.group IS NOT NULL). However, if I just use that syntax by itself, it returns a bool for each row, {index: bool}:

[df.group.notnull()]

How does the bracket notation work with pandas ? Another example would be:

df.id[df.id==458514]            # filters out rows
# vs 
[df.id==458514]                 # returns a bool

Upvotes: 0

Views: 1188

Answers (1)

RichieV
RichieV

Reputation: 5183

Not a full answer, just a breakdown of df.id[df.id==458514]

  • df.id returns a series with the contents of column id
  • df.id[...] slices that series with either 1) a boolean mask, 2) a single index label or a list of them, 3) a slice of labels in the form start:end:step. If it receives a boolean mask then it must be of the same shape as the series being sliced. If it receives index label(s) then it will return those specific rows. Sliciing works just as with python lists, but start and end be integer locations or index labels (e.g. ['a':'e'] will return all rows in between, including 'e').
  • df.id[df.id==458514] returns a filtered series with your boolean mask, i.e. only the items where df.id equals 458514. It also works with other boolean masks as in df.id[df.name == 'Carl'] or df.id[df.name.isin(['Tom', 'Jerry'])].

Read more in panda's intro to data structures

Upvotes: 1

Related Questions