Reputation: 1667
Scenario: I have a list of dataframes. I am trying to rename the columns and change their order, but the column names do not exactly match, for example: a column might be "iterationlist" or "iteration".
I tried a loop inside a loop to read all the columns and if the name contains what I need, change the name of that column, but I get the error:
TypeError: unhashable type: 'list'
Code:
import pandas as pd
import os
from Tkinter import Tk
from tkFileDialog import askdirectory
from os import listdir
from os.path import isfile, join
import glob
# Get content
mypath = "//DGMS/Desktop/uploaded"
all_files = glob.glob(os.path.join(mypath, "*.xls*"))
contentdataframes = [pd.read_excel(f).assign(Datanumber=os.path.basename(f).split('.')[0].split('_')[0], ApplyOn='')
for f in all_files]
#get list of dates and put to dfs
for dfs in contentdataframes:
dfs.rename(index=str, columns={[col for col in dfs.columns if 'iteration' in col]: "iterationlistfinal"})
Question: What is the proper way to do this?
Upvotes: 1
Views: 547
Reputation: 339
Old, but the solution is simply
dfs.rename(index=str, columns={col : "iterationlistfinal" for col in dfs.columns if 'iteration' in col} )
You can't pass a list in a dictionary
Upvotes: 0
Reputation: 862481
I think need str.contains
for get columns names by substring and then reorder columns by subset with join both lists:
contentdataframes = []
for f in all_files:
df = pd.read_excel(f)
df['Datanumber'] = os.path.basename(f).split('.')[0].split('_')[0]
df['ApplyOn']= ''
mask = df.columns.str.contains('iteration')
c1 = df.columns[mask].tolist()
c2 = df.columns[~mask].tolist()
df = df[c1 + c2]
contentdataframes.append(df)
Upvotes: 2