WolVes
WolVes

Reputation: 1336

Pandas Dataframe cutting off extra digits from excel import

I am reading in an excel sheet into the dataframe, which has worked great. However, one of the columns in the excel sheet has an ID number which is forced to 5 forward digits. Thus, rather than 32, the excel sheet has 00032 and for 500 it would be 00500. When I read this file into pandas however it converts these numbers to it's base value, thus 00500 becomes 500 in the dataframe. However, as I am trying to match the file names which use the 5 digit index value to the ID number this is actually causing an issue. I would just use a list system but I need pandas to easily access multiple columns of data when iterating through the rows. Here is the gist:

downloads = r'C:\Users\...'
filelist = os.listdir(downloads)
SDC = []
for file in filelist:
    if file.startswith('sdc'):
        SDC.append(file[3:8])
print SDC
if SDC == []:
    print "There are no downloaded files to grab at this time."

dir = r'C:\...'
os.chdir(dir)
for i, row in df.iterrows():
    if row['SDC ID'] in SDC:
        dir = r'C:\bleh\Temporary Folder' + row['Theme']
        if not os.path.exists(dir):
             os.makedirs(dir)

How do I force the original value in pandas to quickly iterate and check values against the list?

Upvotes: 0

Views: 289

Answers (1)

Mad Physicist
Mad Physicist

Reputation: 114230

The line if row['SDC ID'] in SDC: checks if an integer exists in a list of strings. There are two simple solutions that I can see:

  1. Compare as strings. Format your integer into a string and keep the test mostly as-is. Something like if '{:05d}'.format(row['SDC ID']) in SDC: should do it. It will format your integer to a zero padded five-digit number.

  2. Compare as integers. You can convert the numbers from the file names directly into integers as you find them since you don't appear to be using their string properties anywhere. Change the line SDC.append(file[3:8]) to SDC.append(int(file[3:8])).

Given the information shown, I would prefer the second option, but not by much and for no good reason beyond aesthetics.

Upvotes: 1

Related Questions