Bill Hambone
Bill Hambone

Reputation: 167

Can't Remove White Spaces from CSV Headers with Pandas

I'm trying to rename headers in a csv that have white spaces. Using these lines from the Pandas API reference is not working. The headers still have white spaces instead of underscores.

import pandas as pd

df = pd.read_csv("my.csv",low_memory=False)
df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_')

Upvotes: 2

Views: 3164

Answers (4)

Patrick Wu
Patrick Wu

Reputation: 140

You can read with regex as sep to remove all spaces in the header:

import pandas as pd
df = pd.read_csv("example.csv", sep='\s*&\s*')

here \s means a blank character, and * means match the previous expression (\s here) zero or any times.

I guess you are reading a file look like this

Name,  Age,   City
John Smith, 30, New York
Jane Doe, 25, San Francisco
Bob Johnson, 45, Los Angeles

or like this

Name        , Age , City
John Smith  , 30  , New York
Jane Doe    , 25  , San Francisco
Bob Johnson , 45  , Los Angeles

Both works with the code above. BUT, using regex in sep may slow down the reading process because 'c' engine does not support regex, so 'python' engine will be used. Be careful when reading very large file.

Upvotes: 0

Bill Hambone
Bill Hambone

Reputation: 167

I ditched Pandas and just used the CSV module in Python 2.7.

import csv
import re
import tempfile
import sys
import os
if sys.version_info >= (3, 3):
    from os import replace
elif sys.platform == "win32":
    from osreplace import replace
else:
    from os import rename as replace

newHeaderList = []

with tempfile.NamedTemporaryFile(dir='.', delete=False) as tmp, \
    open('myFile.txt', 'rb') as f:
    r = csv.reader(f, delimiter = '\t')
    w = csv.writer(tmp, delimiter = '\t', quoting=csv.QUOTE_NONNUMERIC)
    header = next(r)
    for h in header:
        headerNoSpace = re.sub("\s+", "_", h.strip())
        newHeaderList.append(headerNoSpace)
    w.writerow(newHeaderList)
    for row in r:
        w.writerow(row)

os.rename(tmp.name, new_text_filepath)


new_txt = csv.reader(open('newFile.txt', "rb"), delimiter = '\t')
out_csv = csv.writer(open('myFile.csv', 'wb'))
out_csv.writerows(new_txt)

Upvotes: 0

pizza lover
pizza lover

Reputation: 523

Tried using rename?

df.rename(index=str, columns={"A space": "a", "B space ": "c"})

Upvotes: 0

576i
576i

Reputation: 8362

Try using a list comprehension.

df.columns = [c.strip().lower().replace(' ', '_') for c in df.columns]

Upvotes: 6

Related Questions