arv
arv

Reputation: 398

Remove commas from all columns except one

df
               
   date           price      vol    
0 2010-01-04  34,57282657    2,600,000
1 2010-01-04  123,900        2,600,000
2 2010-01-04  353,6789738    2,600,000

Is there a way to remove commas from all columns except 1 or 2 (here, just date) in general code? (I have 20 columns in reality.)

Expected output:

   date           price      vol    
0 2010-01-04  3457282657    2600000
1 2010-01-04  123900        2600000
2 2010-01-04  3536789738    2600000

Upvotes: 3

Views: 1457

Answers (4)

Umar.H
Umar.H

Reputation: 23099

The best solution imo is to use thousands=',' in your read statement i.e

pd.read_csv(...,thousands=',')

from io import StringIO


d = """   date           price      vol    
0 2010-01-04  34,57282657    2,600,000
1 2010-01-04  123,900        2,600,000
2 2010-01-04  353,6789738    2,600,000"""

df = pd.read_csv(StringIO(d),sep='\s+',thousands=',')

print(df)
         date       price      vol
0  2010-01-04  3457282657  2600000
1  2010-01-04      123900  2600000
2  2010-01-04  3536789738  2600000

print(df.dtypes)

date     object
price     int64
vol       int64
dtype: object

we can use filter and replace

hat tip to @Shubham Sharma for regex corrections.

df[df.filter(regex="^(?!date)").columns] = df.filter(regex="^(?!date)")\
                                                 .replace(",", "", regex=True)

^(?!date) excludes the date column - you can add more by using the bitwise OR symbol | ^(?!date|vol)

print(df)

         date       price      vol
0  2010-01-04  3457282657  2600000
1  2010-01-04      123900  2600000
2  2010-01-04  3536789738  2600000

Explanation

  1. ^ asserts position at start of a line
  2. Negative Lookahead (?!date|vol)

Upvotes: 1

David Erickson
David Erickson

Reputation: 16683

You could replace by specific column positions using .iloc with .replace and passing regex=True. For example, this is how it would look to replace everything after the first column with df.iloc[:,1:]:

import pandas as pd
df = pd.read_clipboard()
df.iloc[:,1:] = df.iloc[:,1:].replace(',','', regex=True)
df

Out[19]: 
         date       price      vol
0  2010-01-04  3457282657  2600000
1  2010-01-04      123900  2600000
2  2010-01-04  3536789738  2600000

Upvotes: 1

gunn gunner
gunn gunner

Reputation: 120

For simplicity, you can convert the column into type int.

df.price.astype(int)
df.vol.astype(int)

Upvotes: 0

Shubham Sharma
Shubham Sharma

Reputation: 71689

Use DataFrame.replace on columns of dataframe excluding the columns from exclude list:

exclude = ['date']

c = df.columns.difference(exclude)
df[c] = df[c].replace(',', '', regex=True)

Result:

         date       price      vol
0  2010-01-04  3457282657  2600000
1  2010-01-04      123900  2600000
2  2010-01-04  3536789738  2600000

Upvotes: 4

Related Questions