Reputation: 934
I have a CSV file with the following data:
Time Pressure
0 2.9852.988
10 2.9882.988
20 2.9902.990
30 2.9882.988
40 2.9852.985
50 2.9842.984
60 2.9852.985.....
for some reason the second column is separated by 2 decimal points. I'm trying to create a dataFrame with pandas but cannot proceed without removing the second decimal point. I cannot do this manually as there are thousands of data points in my file. any ideas?
Upvotes: 1
Views: 1239
Reputation: 394051
You can call the vectorised str
methods to split the string on decimal point, join the result of split but discard the last element, this produces for example a list [2,9852]
which you then join with a decimal point:
In [28]:
df['Pressure'].str.split('.').str[:-1].str.join('.')
Out[28]:
0 2.9852
1 2.9882
2 2.9902
3 2.9882
4 2.9852
5 2.9842
6 2.9852
Name: Pressure, dtype: object
If you want to convert the string to a float then call astype
:
In [29]:
df['Pressure'].str.split('.').str[:-1].str.join('.').astype(np.float64)
Out[29]:
0 2.9852
1 2.9882
2 2.9902
3 2.9882
4 2.9852
5 2.9842
6 2.9852
Name: Pressure, dtype: float64
Just remember to assign the conversion back to the original df:
df['Pressure'] = df['Pressure'].str.split('.').str[:-1].str.join('.').astype(np.float64)
Upvotes: 2