node_analyser
node_analyser

Reputation: 1612

Removing comma from values in column (csv file) using Python Pandas

I want to remove commas from a column named size. CSV looks like below:

number   name    size
   1      Car     9,32,123
   2      Bike    1,00,000
   3      Truck   10,32,111

I want the output as below:

number   name    size
   1      Car     932123
   2      Bike    100000
   3      Truck   1032111

I am using python3 and Pandas module for handling this csv. I am trying replace method but I don't get the desired output.

Snapshot from my code :

import pandas as pd

df = pd.read_csv("file.csv")

// df.replace(",","")
// df['size'] = df['size'].replace(to_replace = "," , value = "")
// df['size'] = df['size'].replace(",", "")

df['size'] = df['size'].replace({",", ""})
print(df['size'])  // expecting to see 'size' column without comma

I don't see any error/exception. The last line print(df['size']) simply displays values as it is, ie, with commas.

Upvotes: 2

Views: 11821

Answers (3)

Daweo
Daweo

Reputation: 36370

I am using python3 and Pandas module for handling this csv

Note that pandas.read_csv function has optional argument thousands, if , are used for denoting thousands you might set thousands="," consider following example

import io
import pandas as pd
some_csv = io.StringIO('value\n"1"\n"1,000"\n"1,000,000"\n')
df = pd.read_csv(some_csv, thousands=",")
print(df)

output

     value
0        1
1     1000
2  1000000

For brevity I used io.StringIO, same effect might be achieved providing name of file with same content as first argument in io.StringIO.

Upvotes: 4

Mustafa Aydın
Mustafa Aydın

Reputation: 18306

With replace, we need regex=True because otherwise it looks for exact match in a cell, i.e., cells with , in them only:

>>> df["size"] = df["size"].replace(",", "", regex=True)
>>> df

   number   name     size
0       1    Car   932123
1       2   Bike   100000
2       3  Truck  1032111

Upvotes: 4

Henry Ecker
Henry Ecker

Reputation: 35626

Try with str.replace instead:

df['size'] = df['size'].str.replace(',', '')

Optional convert to int with astype:

df['size'] = df['size'].str.replace(',', '').astype(int)
   number   name     size
0       1    Car   932123
1       2   Bike   100000
2       3  Truck  1032111

Sample Frame Used:

df = pd.DataFrame({'number': [1, 2, 3], 'name': ['Car', 'Bike', 'Truck'],
                   'size': ['9,32,123', '1,00,000', '10,32,111']})
   number   name       size
0       1    Car   9,32,123
1       2   Bike   1,00,000
2       3  Truck  10,32,111

Upvotes: 3

Related Questions