dragon951
dragon951

Reputation: 396

How to filter a Pandas series while writing to list

For a given column 'A' in a pandas dataframe 'tbl', I have been trying to write the column .tolist() and filter out the items that are '.'. This works, but doesn't seem very readable:

list_of_A = tbl['A'][~tbl['A'].isin(['.'])].tolist()

Also, checking a list of one seems needlessly slow, though str.contains('.') seems like it would be slower due to pattern matching. Am I missing a better method?

Update: @jpp, @piRSquared and @Scott-Boston all had good methods, so it came down to a test:

>>> tbl = pd.DataFrame(np.random.randn(50000, 3), columns=list('ABC'))
>>> tbl.loc[tbl.sample(10000).index, 'A'] = '.'
>>> min(timeit.repeat("list_of_A = tbl.loc[tbl['A'].ne('.'), 'A'].tolist()", repeat=1000, number=100, globals=globals()))
0.37328900000102294
>>> min(timeit.repeat("list_of_A = tbl.A.values[tbl.A.values != '.'].tolist()", repeat=1000, number=100, globals=globals()))
0.1470019999997021
>>> min(timeit.repeat("tbl.query('A != \".\"')['A'].tolist()", repeat=1000, number=100, globals=globals()))
0.45748099999946135

Discovering .loc has opened up a world of possibilities for me, but to smash 'n' grab a filtered column list, looks like converting to an ndarray is fastest.

Upvotes: 2

Views: 889

Answers (4)

piRSquared
piRSquared

Reputation: 294258

Use a list comprehension

list_of_A = [x for x in df['A'] if x != '.']

Probably more performant

list_of_A = [x for x in df['A'].values.tolist() if x != '.']

Faster still

tbl.A.values[tbl.A.values != '.'].tolist()

Upvotes: 2

Scott Boston
Scott Boston

Reputation: 153460

You can use query for more readability:

tbl.query('A != "."')['A'].tolist()

Upvotes: 1

jpp
jpp

Reputation: 164673

Two things to note:

  • Chained indexing is explicitly discouraged; avoid as it is never necessary.
  • isin is only recommended if you are comparing to a list / set / pd.Series. loc accessor with == operator should be sufficient.

Try this instead:

list_of_A = tbl.loc[~(tbl['A'] == '.'), 'A'].tolist()

As @BradSolomon points out, there is an alternative:

list_of_A = tbl.loc[tbl['A'].ne('.'), 'A'].tolist()

Upvotes: 2

Richard
Richard

Reputation: 61289

Using a temporary variable and two lines of code makes this more readable:

exclude   = tbl['A'].isin(['.'])
list_of_A = tbl['A'][~exclude].tolist()

Upvotes: 0

Related Questions