Reputation: 5
I have a data frame with a column for number of reviews the dataframe column is listed in this format
816 ratings
1,139 ratings
5 ratings
22,3456 ratings
Id like to convert this to an integer so I can sort the dataframe. My output should be
816
1139
5
223456
I tried
df=df['num_reviews'].str.extract('(\d+)').astype(float)
df
however this converted everything after the comma into a decimal. (i.e. 22,3456 returns 22.0) and using .astype(int) gave me errors due to fields having NaN
Upvotes: 0
Views: 1691
Reputation: 16916
df['num_reviews'].str.replace(r'\D+', '').replace('','0').astype(float)
Test case:
df = pd.DataFrame({
'num_reviews': ["816 ratings", "1,139 ratings",
"5 ratings", "no ratings", "22,3456 ratings"]
})
print (df['num_reviews'].str.replace(r'\D+', '').replace('','0').astype(float))
Output:
0 816.0
1 1139.0
2 5.0
3 0.0
4 223456.0
Upvotes: 1