Reputation: 67
I'm trying to solve the HackerRank problem Big Sorting:
Problem Statement:
Consider an array of numeric strings where each string is a positive number with anywhere from 1 to 1000000 digits. Sort the array's elements in non-decreasing, or ascending order of their integer values and print each element of the sorted array on a new line.
Function Description
Complete the bigSorting function in the editor below. It should return the sorted string array.
bigSorting has the following parameter(s):
unsorted
: an unsorted array of integers as stringsInput Format
The first line contains an integer
N
, denoting the number of strings inunsorted
.Each of the
N
subsequent lines contains an integer stringunsorted[i]
.Constraints
1 <= N <= 200000
- Each string is guaranteed to represent a positive integer without leading zeros.
- The total number of digits across all strings in
unsorted
is between1
and1000000
(inclusive).Output Format
Print each element of the sorted array on a new line.
My code:
n = int(raw_input())
unsorted = []
for _ in xrange(n):
unsorted_item = raw_input()
unsorted.append(unsorted_item)
result = sorted(unsorted)
print (result)
But it's giving a completely different result:
Input
6
31415926535897932384626433832795
1
3
10
3
5
Expected Output
1
3
3
5
10
31415926535897932384626433832795
Actual Output
1
10
3
3
31415926535897932384626433832795
5
Upvotes: 2
Views: 2215
Reputation: 14670
What you’re doing is not enough because you need to sort by text length - shorter strings come first - and in the lexicographical order if strings are of the same length. So, a simple solution could be this:
n = int(input())
unsorted = []
for _ in range(n):
unsorted_item = input()
unsorted.append(unsorted_item)
for element in sorted(unsorted, key=int):
print(element)
Upvotes: 4
Reputation: 51112
Anatolii's solution is certainly the simplest one, because these strings all represent numbers, and they should be sorted in numerical order.
Here's an alternative solution which doesn't require parsing the strings as ints: because they're all positive, then numbers with more digits are always larger, and numbers with equally many digits can be compared as strings. So, first we compare by string length, then strings of equal length are compared the way strings are usually compared.
We can use a tuple (len(s), s)
as the comparison key; they are compared by their first component (string length), then by their second component (the string itself) as a tie-breaker.
result = sorted(unsorted, key=lambda s: (len(s), s))
This is a much more efficient solution on average, because parsing ints is slow, but comparing strings is usually fast because they usually differ within the first few digits. In the worst case, you have to compare every digit, but parsing as an int always requires looking at every digit.
I tried with lists of 1,000 strings representing numbers in the range 0 to 10^(10^4), i.e. numbers up to 10^4 digits long; the solution without int
is over a hundred times faster:
>>> import random, timeit
>>> lst = [ str(random.randint(0, 10**10**4)) for i in range(1000) ]
>>> timeit.timeit(lambda: sorted(lst, key=lambda s: (len(s), s)), number=10)
0.01330024599974422
>>> timeit.timeit(lambda: sorted(lst, key=int), number=10)
4.78583431900006
Upvotes: 5