Reputation: 396
For a given column 'A'
in a pandas dataframe 'tbl'
, I have been trying to write the column .tolist()
and filter out the items that are '.'
. This works, but doesn't seem very readable:
list_of_A = tbl['A'][~tbl['A'].isin(['.'])].tolist()
Also, checking a list of one seems needlessly slow, though str.contains('.')
seems like it would be slower due to pattern matching. Am I missing a better method?
Update: @jpp, @piRSquared and @Scott-Boston all had good methods, so it came down to a test:
>>> tbl = pd.DataFrame(np.random.randn(50000, 3), columns=list('ABC'))
>>> tbl.loc[tbl.sample(10000).index, 'A'] = '.'
>>> min(timeit.repeat("list_of_A = tbl.loc[tbl['A'].ne('.'), 'A'].tolist()", repeat=1000, number=100, globals=globals()))
0.37328900000102294
>>> min(timeit.repeat("list_of_A = tbl.A.values[tbl.A.values != '.'].tolist()", repeat=1000, number=100, globals=globals()))
0.1470019999997021
>>> min(timeit.repeat("tbl.query('A != \".\"')['A'].tolist()", repeat=1000, number=100, globals=globals()))
0.45748099999946135
Discovering .loc
has opened up a world of possibilities for me, but to smash 'n' grab a filtered column list, looks like converting to an ndarray is fastest.
Upvotes: 2
Views: 889
Reputation: 294258
Use a list comprehension
list_of_A = [x for x in df['A'] if x != '.']
Probably more performant
list_of_A = [x for x in df['A'].values.tolist() if x != '.']
Faster still
tbl.A.values[tbl.A.values != '.'].tolist()
Upvotes: 2
Reputation: 153460
You can use query
for more readability:
tbl.query('A != "."')['A'].tolist()
Upvotes: 1
Reputation: 164673
Two things to note:
isin
is only recommended if you are comparing to a list
/ set
/ pd.Series
. loc
accessor with ==
operator should be sufficient.Try this instead:
list_of_A = tbl.loc[~(tbl['A'] == '.'), 'A'].tolist()
As @BradSolomon points out, there is an alternative:
list_of_A = tbl.loc[tbl['A'].ne('.'), 'A'].tolist()
Upvotes: 2
Reputation: 61289
Using a temporary variable and two lines of code makes this more readable:
exclude = tbl['A'].isin(['.'])
list_of_A = tbl['A'][~exclude].tolist()
Upvotes: 0