Deb
Deb

Reputation: 539

Efficiently extract numbers from a column in python

I have a column in pandas dataframe as below:

     Manufacture_Id  Score  Rank
0             S1     93     1
1             S1     91     2
2             S1     86     3
3             S2     88     1
4            S25     73     2
5           S100     72     3
6           S100     34     1
7           S100     24     2

I want to extract the ending numbers from the 'Manufacture_Id' column into a new column as below:

   Manufacture_Id  Score  Rank   Id
0             S1     93     1    1
1             S1     91     2    1
2             S1     86     3    1
3             S2     88     1    2
4            S25     73     2   25
5           S100     72     3  100
6           S100     34     1  100
7           S100     24     2  100

I have written the below code which gives the results but it is not efficient when the data becomes big.

test['id'] = test.Manufacture_Id.str.extract(r'(\d+\.\d+|\d+)')

Is there a way to do it efficiently?

Data:

#Ceate dataframe
data = [
    ["S1",93,1],
    ["S1",91,2],
    ["S1",86,3],
    ["S2",88,1],
    ["S25",73,2],
    ["S100",72,3],
    ["S100",34,1],
    ["S100",24,2],
       
]

#dataframe
test = pd.DataFrame(data, columns = ['Manufacture_Id', 'Score', 'Rank'])

Upvotes: 1

Views: 339

Answers (1)

Sabil
Sabil

Reputation: 4510

Following code will be more efficient than regex.

test["id"] = test['Manufacture_Id'].str[1:].astype(int)

But if the S is not constant then you try following snippet.

test["id"] = test.Manufacture_Id.str.extract('(\d+)').astype(int)

Upvotes: 1

Related Questions