Using pandas.Series.str.get: what is the correct way?

Question

I am following Wes Mckinney's wonderful book to get up to speed with pandas. I however can't seem to get why pandas.Series.str.get won't work. I've looked at a few Github issues and questions on here but none seems to help.

Data

data = pd.Series({'Dave': 'dave@google.com', 'Steve': 'steve@gmail.com', 'Rob': 'rob@yahoo.com', 'Wes': np.nan}
)

Code

import pandas as pd
import re
import numpy as np
pattern = '[a-zA-Z0-9]+@.*'
matches = data.str.match(pattern)
matches.str.get(1)

The above code should work and result in something like:

Dave NaN
Rob  NaN
Steve NaN

I did use a different regex pattern than used in the book but don't think that's the issue.

ERROR:

raise AttributeError("Can only use .str accessor with string " "values!") AttributeError: Can only use .str accessor with string values

What am I missing? I am using pycharm community and python 3.6.6, pandas Version: 0.24.2 if that makes a difference.

Here's a screenshot from the book:

EdChum · Accepted Answer

The reason you get a series containing NaNs is because matches is a boolean Series:

In[58]:
matches

Out[58]: 
Dave     True
Steve    True
Rob      True
Wes       NaN
dtype: object

So it doesn't make sense to return an element at the ordinal position in this case, hence why you get a Series of NaNs.

If you look at the example in the docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.get.html#pandas.Series.str.get

In[61]:
s = pd.Series(["String",
...               (1, 2, 3),
...               ["a", "b", "c"],
...               123,
...               -456,
...               {1: "Hello", "2": "World"}])
s

Out[61]: 
0                        String
1                     (1, 2, 3)
2                     [a, b, c]
3                           123
4                          -456
5    {1: 'Hello', '2': 'World'}
dtype: object

In[62]:
s.str.get(1)

Out[62]: 
0        t
1        2
2        b
3      NaN
4      NaN
5    Hello
dtype: object

So here it's returning the element at the ordinal position for each row, You can see that for some rows there is no 2nd element so it returns NaN.

Using pandas.Series.str.get: what is the correct way?

Answers (1)

Related Questions