Reputation: 11
I am trying to solve the following problem:
Given: A positive integer k≤20k≤20, a positive integer n≤104n≤104, and k arrays of size n containing positive integers not exceeding 10^5.
Return: For each array, output an element of this array occurring strictly more than n/2 times if such element exists, and "-1" otherwise.
My attempt at it is the following:
myFile = open('rosalind_maj.txt', 'r')
outputf = open('output.txt', 'w')
TOTAL_ARRAYS, SIZE = myFile.readline().split()
TOTAL_ARRAYS = int(TOTAL_ARRAYS)
SIZE = int(SIZE)
lines=myFile.readlines()
for i in range(0, TOTAL_ARRAYS):
array = map(int, (lines[i].strip().split()))
occurance = []
output = []
considered = []
for element in array:
if element not in considered:
if (array.count(element)) > SIZE/2:
output.append(element)
considered.append(element)
if len(output) == 0:
print (-1)
outputf.write( str(-1) + " " )
else:
for item in output:
print item
outputf.write(str(item) + " " )
A sample of the input text file looks like the following:
4 8
5 5 5 5 5 5 5 5
8 7 7 7 1 7 3 7
7 1 6 5 10 100 1000 1
5 1 6 7 1 1 10 1
and the correct answer to the sample input is:
5 7 -1 -1
My code works but is too slow. It takes 20 secs for an input with 19 arrays, each of size 8000 elements.
Upvotes: 0
Views: 415
Reputation: 8184
I tried the following:
#!/usr/bin/env python
from itertools import imap
from collections import defaultdict
with open("rosalind_maj.txt", "r") as in_file:
with open("output.txt", "w") as out_file:
nb_arrays, ar_size = map(int, in_file.readline().split())
majority = ar_size / 2.0
# i is probably useless if the file has the correct number of lines
for i, line in enumerate(in_file, start=1):
counts = defaultdict(int)
found = False
if i > nb_arrays:
break
for element in line.strip().split():
counts[element] += 1
if counts[element] > majority:
out_file.write("%s " % element)
found = True
break
if not found:
out_file.write("-1 ")
On a dataset of a size about the one you said you tested (19 lines with 8000 elements), and on my old 2007 laptop, this runs in about 0.15 seconds whereas your code runs in about 7 seconds (I suppressed the print
from it).
The resulting output files are the same.
I'm not sure the reason, but the slowness could be in part due to the append()
and testing the presence of elements in a list in your code.
Upvotes: 0
Reputation: 11
You can use Majority Vote Algorithm. It runs in O(n) time and takes O(1) space!
Upvotes: 1