Hayden
Hayden

Reputation: 498

Pandas Removing Leading Zeros

I have a short script to pivot data. The first column is a 9 digit ID number, often beginning with zeros such as 000123456

Here is the script:

df = pd.read_csv('source')

new_df = df.pivot_table(index = 'id', columns = df.groupby('id').cumcount().add(1), values = ['prog_id', 'prog_type'], aggfunc='first').sort_index(axis=1,level=1)

new_df.columns = [f'{x}_{y}' for x,y in new_df.columns]

new_df.to_csv('destination')
print(new_df)

Although the CSV is being read with an id such as 000123456, the output only contains 123456

Even when setting an explicit dtype, Pandas removes the leading zeros. Is there a work around for telling Pandas to leave the leading zeros?

Upvotes: 4

Views: 7459

Answers (2)

Pedro Ferreira
Pedro Ferreira

Reputation: 21

You could use pandas' zfill() method right after reading your csv file "source". Basically, you would fill the values of your attribute "id", with as many zeros as you would like, in this particular case, making the number 9 digits long (3 zeros + 6 original digits). So, we would have:

df = pd.read_csv('source')

df.index = df.index.str.zfill(9)

# (...)

Upvotes: 2

Hayden
Hayden

Reputation: 498

Per comment on original post, set dtype as string:

df = pd.read_csv('source', dtype={'id':np.str})

Upvotes: 4

Related Questions