olovholm
olovholm

Reputation: 1392

Python makes four digit string into int implicitly

I'm currently working on a script which extracts data from two sources where one of those is Norwegian post codes. Norwegian post codes are made up of four digits and some begins with a zero.

Here is the code:

#This section loads data on Norwegian post codes and places into a dictionary where postcode is key
f = open("postoversikt.txt", "r");
f1 = open("PCODES_USER_TRIM.txt","r") #load the file with all the users. 
fo = open("pcodes_out","w")
place = {}
times = {}
for line in f:
    words = line.rsplit("\t");
    place[str(words[0])] = words[1]; #Reverse these to change the key and value - Default key: postcode value: place

number = 0;
number_unique = 0;
number_alike = 0;

for line in f1:
    number = number + 1;
    words1 = line.rsplit(";");
    if not words1[1] in times:
        number_unique = number_unique + 1;
        times[words1[1]] = 1;
    else: 
        number_alike = number_alike + 1;
        times[words1[1]] = times[words1[1]] + 1;

for key, value in times.items():
     print key+";"+value+";"+words[key];
     fo.write(key+";"+value+";"+words[key]+"\n");


print "Totalt antall objekter behandlet er: "+ str(number);
print "Hvorav antall unike var: "+ str(number_unique);
print "Antall like nummer ble funnet: " + str(number_alike);

Some lines from PCODES_USER_TRIM:

75621;4517;45 - 65
35214;7650;25 - 45
55624;9015;25 - 45
09523;5306;45 - 65
09051;2742;25 - 45
88941;1661;18 - 25

Some lines from postoversikt.txt:

0001    OSLO    0301    OSLO    P
0010    OSLO    0301    OSLO    B
0015    OSLO    0301    OSLO    K
0016    OSLO    0301    OSLO    K
0017    OSLO    0301    OSLO    K
0018    OSLO    0301    OSLO    G
0021    OSLO    0301    OSLO    K
0022    OSLO    0301    OSLO    K

One of the problems that occur is that the postcodes that begins with a zero is striped of the initial zero. My guess is that this is due to an internal conversion to an int (I'm just a beginner in Python, so please forgive if my problems are a bit mundane). I would like these to be in the standard format of four numbers xxxx. My second problems which I guess follows from my first is that I want to add the name of the post code to the final print out. This doesn't work as I can't use the key to refer to the place in words.

I used to convert the object I print to Strings using the str() method, but I refrained from doing so in the current version as I want to handle the problem by its root.

Could someone please help me with my little problem? How could I use rsplit to put Strings into the words dictionary without converting it to integers?

Upvotes: 0

Views: 581

Answers (3)

Francis Avila
Francis Avila

Reputation: 31621

Python is "strongly typed" and does not automatically coerce key types, or any types for that matter:

>>> d = {'01234':'value'}
>>> print d.items()
[('01234', 'value')]

I don't see anything in your code that does conversion to int, but I'm pretty sure this is not the code you are using because it contains at least one syntax error:

 fo.write("key+";"+value+";"+words[key]\n")

Please paste the actual code you are using.

Additionally, give us a few lines from the input documents and their formats, so we don't have to guess.

EDIT:

This code will do what you want. Again, there's no sign of leading zeros being lost...

places = {}
for line in f:
    post, place, _rest = line.split('\t',2)
    places[post] = place
f.close()

times = {}
for line in f1:
    _id, post, _rest = line.split(';',2)
    times[post] = times.get(post, 0) + 1
f1.close()

for k,v in times.iteritems():
    fo.write("%s;%s;%s\n" % (k,v,places[k]))
fo.close()

number = sum(times.itervalues())
number_unique = len(times)
number_alike = number - number_unique

print number, number_unique, number_alike

Upvotes: 2

Alex
Alex

Reputation: 158

The fact that python trims the 4 digits number (ex: 0004 -> 4) shouldn't be a problem as long as the counting operation yields the correct results.

What you then needs is simple to format your output the way you want. For instance:

i=4
print "%4d" % i

Gives the result: 0004

i=1254
print "%04d" % i

Gives the result : 1254

More details in there for string formatting in python: http://docs.python.org/release/2.4.4/lib/typesseq-strings.html

Upvotes: 0

orlp
orlp

Reputation: 117691

If you want to format an integer so that it is at least 4 integers long (pre-padded with zeroes) you must do it like this:

integer = 5
s = "%04d" % integer

Upvotes: 4

Related Questions