Reputation: 12529
I have a pandas column that contains rows of words that are surrounded in quotes, brackets or nothing, like this:
"cxxx"
[asdfasd]
asdfasdf
[asdf]
"asdf"
My problem is that the below code is stripping the first and last characters from the elements that don't have quotes or brackets and I'm not sure why.
def keyword_cleanup(x):
if "\"" or "[" in x:
return x[1:-1]
else:
return x
csv["Keyword"] = csv["Keyword"].apply(keyword_cleanup)
Upvotes: 1
Views: 574
Reputation: 880547
if "\"" or "[" in x:
should be
if "\"" in x or "[" in x: # x must contain a left bracket or double-quote.
or
if x.startswith(('"', '[')): # x must start with a left-braket or double-quote
since Python parses the former as
if ("\"") or ("[" in x):
due to the in
operator binding more tightly than or
. (See Python operator precedence.)
Since any non-empty string such as "\""
has boolean truth value True
, the if-statement
's condition is always True, and that is why
keyword_cleanup
was always returning x[1:-1]
.
However, also note that Pandas has string operators builtin. Using them will be far faster than using apply
to call a custom Python function for each item in the Series.
In [136]: s = pd.Series(['"cxxx"', '[asdfasd]', 'asdfasdf', '[asdf]', '"asdf"'])
In [137]: s.str.replace(r'^["[](.*)[]"]$', r'\1')
Out[137]:
0 cxxx
1 asdfasd
2 asdfasdf
3 asdf
4 asdf
dtype: object
If you want to strip all brackets or double quotes from both ends of each string, you could instead use
In [144]: s.str.strip('["]')
Out[144]:
0 cxxx
1 asdfasd
2 asdfasdf
3 asdf
4 asdf
dtype: object
Upvotes: 3