Reputation: 539
I have a column in pandas dataframe as below:
Manufacture_Id Score Rank
0 S1 93 1
1 S1 91 2
2 S1 86 3
3 S2 88 1
4 S25 73 2
5 S100 72 3
6 S100 34 1
7 S100 24 2
I want to extract the ending numbers from the 'Manufacture_Id' column into a new column as below:
Manufacture_Id Score Rank Id
0 S1 93 1 1
1 S1 91 2 1
2 S1 86 3 1
3 S2 88 1 2
4 S25 73 2 25
5 S100 72 3 100
6 S100 34 1 100
7 S100 24 2 100
I have written the below code which gives the results but it is not efficient when the data becomes big.
test['id'] = test.Manufacture_Id.str.extract(r'(\d+\.\d+|\d+)')
Is there a way to do it efficiently?
Data:
#Ceate dataframe
data = [
["S1",93,1],
["S1",91,2],
["S1",86,3],
["S2",88,1],
["S25",73,2],
["S100",72,3],
["S100",34,1],
["S100",24,2],
]
#dataframe
test = pd.DataFrame(data, columns = ['Manufacture_Id', 'Score', 'Rank'])
Upvotes: 1
Views: 339
Reputation: 4510
Following code will be more efficient than regex.
test["id"] = test['Manufacture_Id'].str[1:].astype(int)
But if the S
is not constant then you try following snippet.
test["id"] = test.Manufacture_Id.str.extract('(\d+)').astype(int)
Upvotes: 1