Reputation: 3577
I am reindexing files from multiple folders. A file initially looks like this:
Combined Percent
0101 50
0102 25
0104 25
I then use this code to create a new index which is the union of the indexes of all my files in a folder:
import pandas as pd
from glob import glob
folders=(r'C:\pathway_to_folders')
for folder in os.listdir(folders):
path=os.path.join(folders,folder)
filenames=glob(os.path.join(path+'/*.csv'))
def rfile(fn):
return pd.read_csv(fn, dtype='str', index_col=0)
dfs = [rfile(fn) for fn in filenames]
idx = dfs[0].index
for i in range(1, len(dfs)):
idx = idx.union(dfs[i].index)
print idx
when I set the column Combined
as the index column, dfs
now looks like this:
Combined Percent
101 50
102 25
104 25
Is there a way to keep the formatting for the index the same as the original column, or to manipulate my code to not have to set an index possibly?
Upvotes: 3
Views: 53
Reputation: 394101
I believe that this is still a long standing bug where you can't set the dtype and specify the same column as the index column, you have to do this as a secondary step:
def rfile(fn):
return pd.read_csv(fn, dtype=str).set_index('Combined')
Upvotes: 2