Reputation: 13309
I am following Wes Mckinney's wonderful book to get up to speed with pandas
. I however can't seem to get why pandas.Series.str.get
won't work. I've looked at a few Github issues and questions on here but none seems to help.
Data
data = pd.Series({'Dave': '[email protected]', 'Steve': '[email protected]', 'Rob': '[email protected]', 'Wes': np.nan}
)
Code
import pandas as pd
import re
import numpy as np
pattern = '[a-zA-Z0-9]+@.*'
matches = data.str.match(pattern)
matches.str.get(1)
The above code should work and result in something like:
Dave NaN
Rob NaN
Steve NaN
I did use a different regex pattern than used in the book but don't think that's the issue.
ERROR:
raise AttributeError("Can only use .str accessor with string " "values!") AttributeError: Can only use .str accessor with string values
What am I missing? I am using pycharm community and python 3.6.6, pandas Version: 0.24.2 if that makes a difference.
Here's a screenshot from the book:
Upvotes: 2
Views: 243
Reputation: 393963
The reason you get a series containing NaN
s is because matches
is a boolean Series
:
In[58]:
matches
Out[58]:
Dave True
Steve True
Rob True
Wes NaN
dtype: object
So it doesn't make sense to return an element at the ordinal position in this case, hence why you get a Series
of NaN
s.
If you look at the example in the docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.get.html#pandas.Series.str.get
In[61]:
s = pd.Series(["String",
... (1, 2, 3),
... ["a", "b", "c"],
... 123,
... -456,
... {1: "Hello", "2": "World"}])
s
Out[61]:
0 String
1 (1, 2, 3)
2 [a, b, c]
3 123
4 -456
5 {1: 'Hello', '2': 'World'}
dtype: object
In[62]:
s.str.get(1)
Out[62]:
0 t
1 2
2 b
3 NaN
4 NaN
5 Hello
dtype: object
So here it's returning the element at the ordinal position for each row, You can see that for some rows there is no 2nd element so it returns NaN
.
Upvotes: 2