Reputation: 359
This seems like an inherently simple task but I am finding it very difficult to remove the *
from my entire data frame and return the numeric values in each column, including the numbers that did not have *
. The dataframe includes hundreds of more columns and looks like this in short:
Time A1 A2
2.0002546296 1499 1592
2.0006712963 1252 1459
2.0902546296 1731 2223
2.0906828704 1691 1904
2.1742245370 2364 3121
2.1764699074 2096 1942
2.7654050926 *7639* *8196*
2.7658564815 *7088* *7542*
2.9048958333 *8736* *8459*
2.9053125000 *7778* *7704*
2.9807175926 *6612* *6593*
3.0585763889 *8520* *9122*
I have not written it to iterate over every column in df yet but as far as the first column goes I have come up with this
df['A1'].str.replace('*','').astype(float)
which yields
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 NaN
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 NaN
17 NaN
18 NaN
19 7639.0
20 7088.0
21 8736.0
22 7778.0
23 6612.0
24 8520.0
Is there a very easy way to just remove the *
in the dataframe in Pandas?
Upvotes: 14
Views: 90982
Reputation: 1
Use the strip
function.
df['A1'] = df['A1'].str.strip("*")
If you want to remove only from the left side, use lstrip
, and to remove from the right side use rstrip
.
df['A1'] = df['A1'].str.lstrip("*")
df['A1'] = df['A1'].str.rstrip("*")
You can remove different characters at once. Eg:
df['A1'] = df['A1'].str.strip("*/&")
Upvotes: 0
Reputation: 491
I found this to be a simple approach - Use replace
to retain only the digits (and dot
and minus
sign).
This would remove characters, alphabets or anything that is not defined in to_replace
attribute.
So, the solution is:
df['A1'].replace(regex=True, inplace=True, to_replace=r'[^0-9.\-]', value=r'')
df['A1'] = df['A1'].astype(float64)
Upvotes: 4
Reputation: 41
I found the answer of CuriousCoder so brief and useful but there must be a ')'
instead of ']'
So it should be:
df['A1'].replace(regex=True, inplace=True, to_replace=r'[^0-9.\-]',
value=r''] df['A1'] = df['A1'].astype(float64)
Upvotes: 4
Reputation: 1441
There is another solution which uses map and strip functions. You can see the below link: Pandas DataFrame: remove unwanted parts from strings in a column.
df =
Time A1 A2
0 2.0 1258 *1364*
1 2.1 *1254* 2002
2 2.2 1520 3364
3 2.3 *300* *10056*
cols = ['A1', 'A2']
for col in cols:
df[col] = df[col].map(lambda x: str(x).lstrip('*').rstrip('*')).astype(float)
df =
Time A1 A2
0 2.0 1258 1364
1 2.1 1254 2002
2 2.2 1520 3364
3 2.3 300 10056
The parsing procedure only be applied on the desired columns.
Upvotes: 3
Reputation: 7828
use replace which applies on whole dataframe :
df
Out[14]:
Time A1 A2
0 2.000255 1499 1592
1 2.176470 2096 1942
2 2.765405 *7639* *8196*
3 2.765856 *7088* *7542*
4 2.904896 *8736* *8459*
5 2.905312 *7778* *7704*
6 2.980718 *6612* *6593*
7 3.058576 *8520* *9122*
df=df.replace('\*','',regex=True).astype(float)
df
Out[16]:
Time A1 A2
0 2.000255 1499 1592
1 2.176470 2096 1942
2 2.765405 7639 8196
3 2.765856 7088 7542
4 2.904896 8736 8459
5 2.905312 7778 7704
6 2.980718 6612 6593
7 3.058576 8520 9122
Upvotes: 20