Reputation: 576
What I'm trying to do is:
options = ['abc', 'def']
df[any(df['a'].str.startswith(start) for start in options)]
I want to apply a filter so I only have entries that have values in the column 'a' starting with one of the given options.
the next code works, but I need it to work with several options of prefixes...
start = 'abc'
df[df['a'].str.startswith(start)]
The error message is
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Read Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() but haven't got understanding of how to do so.
Upvotes: 5
Views: 6320
Reputation: 576
One more solution:
# extract all possible values for 'a' column
all_a_values = df['a'].unique()
# filter 'a' column values by my criteria
accepted_a_values = [x for x in all_a_values if any([str(x).startswith(prefix) for prefix in options])]
# apply filter
df = df[df['a'].isin(accepted_a_values))]
Took it from here: remove rows and ValueError Arrays were different lengths
The solution provided by @Vaishali is the most simple and logical, but I needed the accepted_a_values list to iterate trough as well. This was not mentioned in the question, so I mark her answer as correct.
Upvotes: 0
Reputation: 38415
You can pass a tuple of options to startswith
df = pd.DataFrame({'a': ['abcd', 'def5', 'xabc', '5abc1', '9def', 'defabcb']})
options = ['abc', 'def']
df[df.a.str.startswith(tuple(options))]
You get
a
0 abcd
1 def5
5 defabcb
Upvotes: 6
Reputation: 6915
You can try this:
mask = np.array([df['a'].str.startswith(start) for start in options]).any(axis=1)
it creates a Series
for each start
option and applies any
along corresponding rows.
You were getting the error because built-in expects a list of bool
s but as the error message suggests "The truth value of a multiple valued object is ambiguous", so you rather need to use an array-aware any
.
Upvotes: 2