Stefano Potter
Stefano Potter

Reputation: 3577

Dtype changing when setting a column as an index

I am reindexing files from multiple folders. A file initially looks like this:

Combined   Percent
0101       50
0102       25
0104       25

I then use this code to create a new index which is the union of the indexes of all my files in a folder:

import pandas as pd
from glob import glob 

folders=(r'C:\pathway_to_folders')
for folder in os.listdir(folders): 
    path=os.path.join(folders,folder)
    filenames=glob(os.path.join(path+'/*.csv'))
    def rfile(fn):
        return pd.read_csv(fn, dtype='str', index_col=0)
    dfs = [rfile(fn) for fn in filenames]
    idx = dfs[0].index
    for i in range(1, len(dfs)):
        idx = idx.union(dfs[i].index)
    print idx

when I set the column Combined as the index column, dfs now looks like this:

Combined   Percent
101        50
102       25
104       25

Is there a way to keep the formatting for the index the same as the original column, or to manipulate my code to not have to set an index possibly?

Upvotes: 3

Views: 53

Answers (1)

EdChum
EdChum

Reputation: 394101

I believe that this is still a long standing bug where you can't set the dtype and specify the same column as the index column, you have to do this as a secondary step:

def rfile(fn):
    return pd.read_csv(fn, dtype=str).set_index('Combined')

Upvotes: 2

Related Questions