add-semi-colons
add-semi-colons

Reputation: 18810

Pandas data frame filter by list values - most efficient

I have following pandas data frame that I have build:

      dark  Mystery  adult  crime  action  comedy  cartoon  winter  snow  skiing
0001  0.00    0.000  0.000   0.00    0.00   0.000     0.00    0.56  0.65   0.789
0004  0.89    0.678 -0.423   0.12    0.00   0.000     0.00    0.00  0.00   0.000
0005  0.00    0.000  0.000   0.00    0.12   0.678    -0.89    0.00  0.00   0.000

I also have a list that has some of the row index values of the data frame. After filtering I want to have my new data frame with indexes matching the values in the list.

l = [001,005]

This is large data frame I am trying to figure out without iterating via loop.

[df.index[idx] for idx in l]

This is wrong but I feel I am close to the answer or may be not.

Result should be:

      dark  Mystery  adult  crime  action  comedy  cartoon  winter  snow  skiing
0001  0.00    0.000  0.000   0.00    0.00   0.000     0.00    0.56  0.65   0.789
0005  0.00    0.000  0.000   0.00    0.12   0.678    -0.89    0.00  0.00   0.000

Upvotes: 0

Views: 255

Answers (2)

ASGM
ASGM

Reputation: 11381

How about using .loc:

df.loc[l]

Note, in your actual example, your indices are probably strings rather than integers. When you declare l = [0001, 0005] it's going to be evaluated as [1,5]. So you might want to use l = ["0001", "0005"] or use string formatting to convert the integers (as Jonathan Eunice shows in his answer).

As an aside, you should also avoid using lowercase l as a variable name, since it looks similar to 1 in many monospace typefaces.

Upvotes: 3

Jonathan Eunice
Jonathan Eunice

Reputation: 22463

If your DataFrame is in df:

newdf = df[df.index.isin(l)]

Of course, you have to be careful here. None of your items in l are truly in the index. l = [001,005] is the same as l = [1,5], whereas your index is really strings a la ['0001', '0002', ...]. Given that, you may want to "upgrade" your selection list l to be parallel to your index first:

l = ["{:04d}".format(i) for i in l ]
newdf = df[df.index.isin(l)]

Upvotes: 1

Related Questions