Reputation: 1774
How does the following filter out the results in pandas
? For example, with this statement:
df[['name', 'id', 'group']][df.id.notnull()]
I get 426 rows (it filters out everything where df.group IS NOT NULL
). However, if I just use that syntax by itself, it returns a bool
for each row, {index: bool
}:
[df.group.notnull()]
How does the bracket notation work with pandas
? Another example would be:
df.id[df.id==458514] # filters out rows
# vs
[df.id==458514] # returns a bool
Upvotes: 0
Views: 1188
Reputation: 5183
Not a full answer, just a breakdown of df.id[df.id==458514]
df.id
returns a series with the contents of column id
df.id[...]
slices that series with either 1) a boolean mask, 2) a single index label or a list of them, 3) a slice of labels in the form start:end:step
. If it receives a boolean mask then it must be of the same shape as the series being sliced. If it receives index label(s) then it will return those specific rows. Sliciing works just as with python lists, but start
and end
be integer locations or index labels (e.g. ['a':'e']
will return all rows in between, including 'e'
).df.id[df.id==458514]
returns a filtered series with your boolean mask, i.e. only the items where df.id
equals 458514
. It also works with other boolean masks as in df.id[df.name == 'Carl']
or df.id[df.name.isin(['Tom', 'Jerry'])]
.Read more in panda's intro to data structures
Upvotes: 1