Reputation: 167
I'm trying to rename headers in a csv that have white spaces. Using these lines from the Pandas API reference is not working. The headers still have white spaces instead of underscores.
import pandas as pd
df = pd.read_csv("my.csv",low_memory=False)
df.columns = df.columns.str.strip().str.lower().str.replace(' ', '_')
Upvotes: 2
Views: 3164
Reputation: 140
You can read with regex as sep
to remove all spaces in the header:
import pandas as pd
df = pd.read_csv("example.csv", sep='\s*&\s*')
here \s
means a blank character, and *
means match the previous expression (\s
here) zero or any times.
I guess you are reading a file look like this
Name, Age, City
John Smith, 30, New York
Jane Doe, 25, San Francisco
Bob Johnson, 45, Los Angeles
or like this
Name , Age , City
John Smith , 30 , New York
Jane Doe , 25 , San Francisco
Bob Johnson , 45 , Los Angeles
Both works with the code above. BUT, using regex in sep
may slow down the reading process because 'c' engine does not support regex, so 'python' engine will be used. Be careful when reading very large file.
Upvotes: 0
Reputation: 167
I ditched Pandas and just used the CSV module in Python 2.7.
import csv
import re
import tempfile
import sys
import os
if sys.version_info >= (3, 3):
from os import replace
elif sys.platform == "win32":
from osreplace import replace
else:
from os import rename as replace
newHeaderList = []
with tempfile.NamedTemporaryFile(dir='.', delete=False) as tmp, \
open('myFile.txt', 'rb') as f:
r = csv.reader(f, delimiter = '\t')
w = csv.writer(tmp, delimiter = '\t', quoting=csv.QUOTE_NONNUMERIC)
header = next(r)
for h in header:
headerNoSpace = re.sub("\s+", "_", h.strip())
newHeaderList.append(headerNoSpace)
w.writerow(newHeaderList)
for row in r:
w.writerow(row)
os.rename(tmp.name, new_text_filepath)
new_txt = csv.reader(open('newFile.txt', "rb"), delimiter = '\t')
out_csv = csv.writer(open('myFile.csv', 'wb'))
out_csv.writerows(new_txt)
Upvotes: 0
Reputation: 523
Tried using rename?
df.rename(index=str, columns={"A space": "a", "B space ": "c"})
Upvotes: 0
Reputation: 8362
Try using a list comprehension.
df.columns = [c.strip().lower().replace(' ', '_') for c in df.columns]
Upvotes: 6