dozyaustin
dozyaustin

Reputation: 661

Python: String slice in pandas DataFrame is a series? I need it to be convertible to int

I have a problem that has kept me up for hours. I need to slice a string variable in a pandas DataFrame and extract an he numerical value (so I can perform a merge). (as a way to provide context, the variables is the result of .groupby ... and now am trying to merge in additional information.

Getting the number out of a string should be easy.

Basically, I am doing the following:

string = x_1 
number = string[2:]
number == 2
et voila! 

To that goal, let's build up code

In [32]: import pandas as pd
    ...: d = {'id' : [1, 2, 3, 4],
    ...:     'str_id' : ['x_2', 'x_4', 'x_8', 'x_1']}
    ...: 

In [33]: df= pd.DataFrame(d)

In [34]: df.head()
Out[34]: 
   id str_id
0   1    x_2
1   2    x_4
2   3    x_8
3   4    x_1

In [35]: df['num_id']=df.str_id.str[2:]

In [36]: df.head()
Out[36]: 
   id str_id num_id
0   1    x_2      2
1   2    x_4      4
2   3    x_8      8
3   4    p_1      1

In [37]: df.dtypes
Out[37]: 
id         int64
str_id    object
num_id    object
dtype: object

The result LOOKS good -- we have an object, so we'll just convert to int and be golden, right? Sadly not so much.

In [38]: df['num_id3'] = int(df['num_id'])
Traceback (most recent call last):

  File "<ipython-input-38-50312cced30b>", line 1, in <module>
    df['num_id3'] = int(df['num_id'])

  File "/Users/igor/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 92, in wrapper
    "{0}".format(str(converter)))

TypeError: cannot convert the series to <type 'int'>

ok let's try something simpler ---stripping leading and trailing blanks

 In [39]: df['num_id3'] = (df['num_id']).strip()
Traceback (most recent call last):

  File "<ipython-input-39-0af6d5f8bb8c>", line 1, in <module>
    df['num_id3'] = (df['num_id']).strip()

  File "/Users/igor/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 2744, in __getattr__
    return object.__getattribute__(self, name)

AttributeError: 'Series' object has no attribute 'strip'

So .. somehow I have a series object ... with a single item in it ... I have not been able to get the series object to convert to anything usable

Please will you help?! Thanks!

Upvotes: 2

Views: 1358

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210872

You can't use int(Series) construction (it's similar to int(['1','2','3']), which also won't work), you should use Series.astype(int) or better pd.to_numeric(Series) instead:

In [32]: df
Out[32]:
   id str_id
0   1    x_2
1   2    x_4
2   3    x_8
3   4    x_1
4   5  x_AAA

In [33]: df['num_id'] = pd.to_numeric(df.str_id.str.extract(r'_(\d+)', expand=False))

In [34]: df
Out[34]:
   id str_id  num_id
0   1    x_2     2.0
1   2    x_4     4.0
2   3    x_8     8.0
3   4    x_1     1.0
4   5  x_AAA     NaN

Upvotes: 2

Related Questions