Reputation: 398
df
date price vol
0 2010-01-04 34,57282657 2,600,000
1 2010-01-04 123,900 2,600,000
2 2010-01-04 353,6789738 2,600,000
Is there a way to remove commas from all columns except 1 or 2 (here, just date) in general code? (I have 20 columns in reality.)
Expected output:
date price vol
0 2010-01-04 3457282657 2600000
1 2010-01-04 123900 2600000
2 2010-01-04 3536789738 2600000
Upvotes: 3
Views: 1457
Reputation: 23099
The best solution imo is to use thousands=','
in your read statement i.e
pd.read_csv(...,thousands=',')
from io import StringIO
d = """ date price vol
0 2010-01-04 34,57282657 2,600,000
1 2010-01-04 123,900 2,600,000
2 2010-01-04 353,6789738 2,600,000"""
df = pd.read_csv(StringIO(d),sep='\s+',thousands=',')
print(df)
date price vol
0 2010-01-04 3457282657 2600000
1 2010-01-04 123900 2600000
2 2010-01-04 3536789738 2600000
print(df.dtypes)
date object
price int64
vol int64
dtype: object
we can use filter
and replace
hat tip to @Shubham Sharma for regex corrections.
df[df.filter(regex="^(?!date)").columns] = df.filter(regex="^(?!date)")\
.replace(",", "", regex=True)
^(?!date)
excludes the date column - you can add more by using the bitwise OR symbol |
^(?!date|vol)
print(df)
date price vol
0 2010-01-04 3457282657 2600000
1 2010-01-04 123900 2600000
2 2010-01-04 3536789738 2600000
Upvotes: 1
Reputation: 16683
You could replace by specific column positions using .iloc
with .replace
and passing regex=True
. For example, this is how it would look to replace everything after the first column with df.iloc[:,1:]
:
import pandas as pd
df = pd.read_clipboard()
df.iloc[:,1:] = df.iloc[:,1:].replace(',','', regex=True)
df
Out[19]:
date price vol
0 2010-01-04 3457282657 2600000
1 2010-01-04 123900 2600000
2 2010-01-04 3536789738 2600000
Upvotes: 1
Reputation: 120
For simplicity, you can convert the column into type int.
df.price.astype(int)
df.vol.astype(int)
Upvotes: 0
Reputation: 71689
Use DataFrame.replace
on columns of dataframe excluding the columns from exclude
list:
exclude = ['date']
c = df.columns.difference(exclude)
df[c] = df[c].replace(',', '', regex=True)
Result:
date price vol
0 2010-01-04 3457282657 2600000
1 2010-01-04 123900 2600000
2 2010-01-04 3536789738 2600000
Upvotes: 4