Reputation: 1396
I work in the office of one of the professors at my college and he has assigned me to read through a whole classes papers to attempt to catch people that plagiarize so I decided to write a program using python that looks at all of the six word phrases in all the papers, and compares them to see if any of the papers have over 200 matching phrases. The six word phrases would be for example...
I ate a potato and it was good. Would be:
I ate a potato and it
ate a potato and it was
a potato and it was good.
My code is currently
import re
import glob
import os
def ReadFile(Filename):
try:
F = open(Filename)
F2=F.read()
except IOError:
print("Can't open file:",Filename)
return []
F3=re.sub("[^a-z ]","",F2.lower())
return F3
def listEm(BigString):
list1=[]
list1.extend(BigString.split(' '))
return list1
Name = input ('Name of folder? ')
Name2=[]
Name3=os.chdir("Documents")
for file in glob.glob("*txt"):
Name2.append(file)
for file in Name2:
index1=0
index2=6
new_list=[]
Words = ReadFile(file)
Words2= listEm(Words)
while index2 <= len(Words2):
new_list.append(Words2[index1:index2])
index1 += 1
index2 += 1
del Name2[0] ##Deletes first file from list of files so program wont compare the same file to itself.
for file2 in Name2:
index=0
index1=6
new_list2=[]
Words1= ReadFile(file2)
Words3= listEm(Words)
while index1 <= len(Words3):
new_list2.append(Words3[index:index1]) ##memory error
index+=1
index2+=1
results=[]
for element in new_list:
if element in new_list2:
results.append(element)
if len(results) >= 200:
print("You may want to examine the following files:",file1,"and",file2)
I'm receiving a Memory error on
new_list2.append(Words3[index:index1])
For some reason and I can't figure out what I'm doing wrong, I've never received a Memory error in my short, one semester programming career. Thanks for any and all help.
Upvotes: 0
Views: 134
Reputation: 16403
You probably want to increment index1
instead of index2
inside the while
with the error. Change index2+=1
to index1+=1
.
Currently you are in an infinite loop because index1 <= len(Words3)
is always true as you do not change index1
, and you append to new_list2
until you run out of memory.
The moral of this error should be to use better variable names than just appending numbers to the end of existing ones. The possibilities for mistypings as you do will be lowered this way.
Upvotes: 2