Reputation: 342
I imported a .csv file with this command:
mydata = pd.read_csv(file ,sep='\t' , engine='python' , dtype = {'Day' : np.datetime64 , 'Year' : np.int} )
But i noticed than some of the column name has blank spaces like Account id
instead of Account_id
Now i got the list of my columns name with this:
dwb_col= data.columns
And i'd like to replace blank spaces " "
with "_"
sign on every column name (i.e. every dwb_col element).
in order to rename the columns in this way: mydata.columns = [my_new_columns_list]
How i can do the find and replace part?
Is there any workaround/shortcut during the importing fase that let me collect the column name with "_"(underscore sign) over the " " (space) ?
Upvotes: 7
Views: 6346
Reputation: 23099
Another way would be using regex \s+
which will match 1 or more white spaces whilst ' '
will only match one
dwb_col = df.columns.str.replace('\s+', '_')
then just re-assign
df.columns = dwb_col
if you have trailing or leading white space you want to remove first you can add a
str.strip
df.columns.str.strip().str.replace('\s+', '_')
regarding number 2 you can import your file and use the nrows
argument to only collect the top n rows to gather the column names.
col_df = pd.read_csv(data,nrows=1)
cols = [col for col in col_df.columns.tolist() if '_' in col]
then read your data with usecols
df = pd.read_csv(data,usecols=cols)
Upvotes: 4
Reputation: 962
This will do, using str.replace
:
df.columns = df.columns.str.replace(" ", "_")
Upvotes: 6
Reputation: 4130
Try this,assume your column names are like this
l = ["hello world","hello cat"]
cols = ['_'.join(i.split()) for i in l]
#outout
['hello_world', 'hello_cat']
Upvotes: 3