Reputation: 1392
I'm currently working on a script which extracts data from two sources where one of those is Norwegian post codes. Norwegian post codes are made up of four digits and some begins with a zero.
Here is the code:
#This section loads data on Norwegian post codes and places into a dictionary where postcode is key
f = open("postoversikt.txt", "r");
f1 = open("PCODES_USER_TRIM.txt","r") #load the file with all the users.
fo = open("pcodes_out","w")
place = {}
times = {}
for line in f:
words = line.rsplit("\t");
place[str(words[0])] = words[1]; #Reverse these to change the key and value - Default key: postcode value: place
number = 0;
number_unique = 0;
number_alike = 0;
for line in f1:
number = number + 1;
words1 = line.rsplit(";");
if not words1[1] in times:
number_unique = number_unique + 1;
times[words1[1]] = 1;
else:
number_alike = number_alike + 1;
times[words1[1]] = times[words1[1]] + 1;
for key, value in times.items():
print key+";"+value+";"+words[key];
fo.write(key+";"+value+";"+words[key]+"\n");
print "Totalt antall objekter behandlet er: "+ str(number);
print "Hvorav antall unike var: "+ str(number_unique);
print "Antall like nummer ble funnet: " + str(number_alike);
Some lines from PCODES_USER_TRIM:
75621;4517;45 - 65
35214;7650;25 - 45
55624;9015;25 - 45
09523;5306;45 - 65
09051;2742;25 - 45
88941;1661;18 - 25
Some lines from postoversikt.txt:
0001 OSLO 0301 OSLO P
0010 OSLO 0301 OSLO B
0015 OSLO 0301 OSLO K
0016 OSLO 0301 OSLO K
0017 OSLO 0301 OSLO K
0018 OSLO 0301 OSLO G
0021 OSLO 0301 OSLO K
0022 OSLO 0301 OSLO K
One of the problems that occur is that the postcodes that begins with a zero is striped of the initial zero. My guess is that this is due to an internal conversion to an int (I'm just a beginner in Python, so please forgive if my problems are a bit mundane). I would like these to be in the standard format of four numbers xxxx. My second problems which I guess follows from my first is that I want to add the name of the post code to the final print out. This doesn't work as I can't use the key to refer to the place in words.
I used to convert the object I print to Strings using the str() method, but I refrained from doing so in the current version as I want to handle the problem by its root.
Could someone please help me with my little problem? How could I use rsplit to put Strings into the words dictionary without converting it to integers?
Upvotes: 0
Views: 581
Reputation: 31621
Python is "strongly typed" and does not automatically coerce key types, or any types for that matter:
>>> d = {'01234':'value'}
>>> print d.items()
[('01234', 'value')]
I don't see anything in your code that does conversion to int
, but I'm pretty sure this is not the code you are using because it contains at least one syntax error:
fo.write("key+";"+value+";"+words[key]\n")
Please paste the actual code you are using.
Additionally, give us a few lines from the input documents and their formats, so we don't have to guess.
This code will do what you want. Again, there's no sign of leading zeros being lost...
places = {}
for line in f:
post, place, _rest = line.split('\t',2)
places[post] = place
f.close()
times = {}
for line in f1:
_id, post, _rest = line.split(';',2)
times[post] = times.get(post, 0) + 1
f1.close()
for k,v in times.iteritems():
fo.write("%s;%s;%s\n" % (k,v,places[k]))
fo.close()
number = sum(times.itervalues())
number_unique = len(times)
number_alike = number - number_unique
print number, number_unique, number_alike
Upvotes: 2
Reputation: 158
The fact that python trims the 4 digits number (ex: 0004 -> 4) shouldn't be a problem as long as the counting operation yields the correct results.
What you then needs is simple to format your output the way you want. For instance:
i=4
print "%4d" % i
Gives the result: 0004
i=1254
print "%04d" % i
Gives the result : 1254
More details in there for string formatting in python: http://docs.python.org/release/2.4.4/lib/typesseq-strings.html
Upvotes: 0
Reputation: 117691
If you want to format an integer so that it is at least 4 integers long (pre-padded with zeroes) you must do it like this:
integer = 5
s = "%04d" % integer
Upvotes: 4