Rnj
Rnj

Reputation: 1189

Sorting large arrays of big numeric stings

I was solving bigSorting() problem from hackerrank:

Consider an array of numeric strings where each string is a positive number with anywhere from 1 to 10^6 digits. Sort the array's elements in non-decreasing, or ascending order of their integer values and return the sorted array.

I know it works as follows:

def bigSorting(unsorted):
    return sorted(unsorted, key=int)

But I didnt guess this approach earlier. Initially I tried below:

def bigSorting(unsorted):
    int_unsorted = [int(i) for i in unsorted]
    int_sorted = sorted(int_unsorted)
    return [str(i) for i in int_sorted]

However, for some of the test cases, it was showing time limit exceeded. Why is it so?

PS: I dont know exactly what those test cases were as hacker rank does not reveal all test cases.

Update

My code is following three steps:

  1. str to int conversion
  2. Sorting
  3. int to str convertsion

Why I felt that my code in OP should also run is that its also O(n log n). But seems that mere same time complexity is not enough here? But then I wanted to know which conversion is avoided by sorted(unsorted, key=int)?

Q1. The third one, int to str conversion?

Q2. If Yes is the answer to Q1, then is it the only reason why my code doesnt work and onliner-sorted-solution works? That is, is there anything else that my code is doing extra above onliner-sorted-solution?

Q3. Also wont onliner-sorted-solution do str to int conversion multiple times for the same number during sorting? That is, everytime it has to compare a specific number, does it have to convert it from str to int? Or sorted is implemented in such a way that it has to convert each element from str to int only once?

Upvotes: 2

Views: 661

Answers (2)

sahasrara62
sahasrara62

Reputation: 11238

Here you are converting list of numerical string to list of integer and vice a versa.

for each conversion of string to integer ie '123456' to 123456 time is consumed and for every 10x in strig size conversion is increased by 100x time . proofed by @kellybundy in his post

adding code and run time if in case link disappear

from timeit import timeit

print(timeit('int("9"*10**5)', number=1))
print(timeit('int("9"*10**6)', number=1))

has run time

0.0568113109911792
5.422084125995752

so we can say time complexity of your conversion (string to integer)is O(nnm), where n is size of string, m is size of list.

This is where your code is taking time (conversion 2 time apart from sorting).

what you can do apart from orignal solution (sorted(unsorted, key = int) use python sorting of string numerical, where it sort them in lexical order ie 0,1,2,3,4

def bigSorting(unsorted):
    return sorted(unsorted, key= lambda x:(len(x), x))

here code doesn't need to convert the element and then sort, but just sort only. so it will take NlogN time

Now a much much faster solution by @kellybundy (he should add this as solution of his own )

def bigSorting(unsorted):
    return sorted(unsorted, key= lambda x:int(x, 16))

Upvotes: 2

BeRT2me
BeRT2me

Reputation: 13242

In your code you:

  1. Make a new list of ints converting str -> int.
  2. Make a new sorted list of those ints.
  3. Make another new list of str, converting int --> str.

That's three completely new (possibly massive) lists with multiple conversions.

Using the key=int sorting argument performs the necessary conversions internally during the sort process itself, which means there is never the costly conversion from int -> str.

We can supply Hacker Rank with just this code, and it passes all tests:

import os

if __name__ == '__main__':
    fptr = open(os.environ['OUTPUT_PATH'], 'w')

    n = int(input())

    result = sorted((input() for _ in range(n)), key=int)

    fptr.write('\n'.join(result))
    fptr.write('\n')

    fptr.close()

Still not 'optimal' perhaps, but from reading through the comments, this does seem to be an expected option. Those performing bucket/merge sorts didn't see much performance gains.

Upvotes: 1

Related Questions