Ying Wang
Ying Wang

Reputation: 57

Python to_csv the missing 0 in front of zipcode

I have a data frame

USER =

   zipcode  userCount
0   00601   5
1   00602   23
2   00603   53
3   00604   2
4   00605   6
5   00606   10
6   00610   8
7   00612   33
8   00613   2
9   00614   2
10  00616   1
11  00617   9
12  00622   6
13  00623   28
14  00624   10
15  00627   8
16  00631   1
17  00637   13
18  00638   9
19  00641   12
20  00646   13

When I save it

USER.to_csv('Total_user.csv',index = False)

I got missing 0 in front of the zipcode. 00601 -> 601

zipcode userCount
601 5
602 23
603 53
604 2
605 6
606 10
610 8
612 33
613 2
614 2
616 1
617 9
622 6
623 28
624 10
627 8
631 1
637 13
638 9
641 12
646 13

Is that anything I missed in the to_csv line? I just want to save the 0 in front of the csv. Then when I read_csv(low_memory = False) Then the zipcode has the normal format.

Upvotes: 0

Views: 2593

Answers (3)

Kunal
Kunal

Reputation: 45

Please use dtype=str as a parameter to read_csv(file,sep,dtype=str) method.

That will fix the issue.

Upvotes: 0

Mabel Villalba
Mabel Villalba

Reputation: 2598

Assuming that the column df['zipcode'] of the first dataframe is already a column of strings, then save it this way:

>>> df.to_csv('zipcodes.csv',dtype={'zipcode':'str','userCount':int})

And then when reading, set all data types to be str, and then convert the ones that are not this way:

>>> pd.read_csv('zipcodes.csv',dtype='str',index_col=0)

   zipcode userCount
0    00601         5
1    00602        23
2    00603        53
3    00604         2
4    00605         6
5    00606        10
6    00610         8
7    00612        33
8    00613         2
9    00614         2
10   00616         1
11   00617         9
12   00622         6
13   00623        28
14   00624        10
15   00627         8
16   00631         1
17   00637        13
18   00638         9
19   00641        12
20   00646        13

>>> df['userCount'] = df['userCount'].astype(int)

>>> df.dtypes

zipcode      object
userCount     int64
dtype: object

Upvotes: 1

Tim
Tim

Reputation: 98

Your data probably being stored as an object type in the data frame. You can confirm this by typing:

df.dtypes
>>> zipCode      object
    userCount    object
    dtype: object

Python doesn't like 0 prefixed integers thus the object dtype. You'll need to quote your data when you save it. You can do this via the quoting parameter in read_csv()

import csv
df.to_csv('tmp.csv', quoting=csv.QUOTE_NONNUMERIC)

If you don't quote your data pandas will convert it to an integer when you re-read it and strip the leading zeros.

Upvotes: 0

Related Questions