Reputation: 71
I want to remove string(subject to approval) after integer(90) in a dataframe.
import pandas as pd
name_dict = {
'Name': ['a','b','c','d'],
'Score': ['90(subject to approval)',80,95,20]
}
df = pd.DataFrame(name_dict)
print (df)
df.set_index('Name').loc['a', 'Score']
Upvotes: 2
Views: 515
Reputation: 1314
You can use regex to replace all non numeric characters and then cast to int.
import pandas as pd
name_dict = {
'Name': ['a','b','c','d'],
'Score': ['90(subject to approval)',80,95,20]
}
df = pd.DataFrame(name_dict)
df['Score'] = df['Score'].replace(r'[^0-9]+', '', regex=True)
print(df)
Output:
Name Score
0 a 90
1 b 80
2 c 95
3 d 20
If you want to only remove the extra string for rows where "Name" == "a" you can use:
import pandas as pd
name_dict = {
'Name': ['a','b','c','d'],
'Score': ['90(subject to approval)',80,95,20]
}
df = pd.DataFrame(name_dict)
df.loc[df['Name'] == 'a', 'Score'] = (
df
.loc[df['Name'] == 'a', 'Score']
.replace(r'[^0-9]+', '', regex=True)
)
Upvotes: 2
Reputation: 30070
You can use .str.extract
to extract the integer part
df['Score'] = df['Score'].astype(str).str.extract('(\d+)')
print(df)
Name Score
0 a 90
1 b 80
2 c 95
3 d 20
Upvotes: 2