Reputation: 83
I have a input file like below:
op.txt
user id query
4d67373f-ca45-4137-efd0-0da69c78123d , bookmy show
4d67373f-ca45-4137-efd0-0da69c78123d , book my show
4d67373f-ca45-4137-efd0-0da69c78123d , book my show
4d67373f-ca45-4137-efd0-0da69c78123d , book my show
7fda21a5-c432-4d95-f93d-6275b68bb396 , 8 gb pen drive
7fda21a5-c432-4d95-f93d-6275b68bb396 , 16 gb pen drive
dba91160-dec4-454c-f34a-c29d6d95c459 , DVD PLATERS
dba91160-dec4-454c-f34a-c29d6d95c459 , DVD PLAYERS
dba91160-dec4-454c-f34a-c29d6d95c459 , DVD PLAYERS
dba91160-dec4-454c-f34a-c29d6d95c459 , IPOD
dba91160-dec4-454c-f34a-c29d6d95c459 , IPOD
dba91160-dec4-454c-f34a-c29d6d95c459 , IPOD
dba91160-dec4-454c-f34a-c29d6d95c459 , IPAD
d900ec5f-bd71-4e2b-84d0-6a2105050923 , minoxidil
d900ec5f-bd71-4e2b-84d0-6a2105050923 , minoxidil 5
775f1159-e310-42b6-d3b0-5ea3fb959568 , printed backcase for xperia L
775f1159-e310-42b6-d3b0-5ea3fb959568 , printed backcase for xperia zr
775f1159-e310-42b6-d3b0-5ea3fb959568 , printed backcase for xperia zr
9b98a9be-bb63-4310-87d5-592a66ae602a , leggings
9b98a9be-bb63-4310-87d5-592a66ae602a , leggings
9b98a9be-bb63-4310-87d5-592a66ae602a , jeggings
83618338-70a0-4512-c763-0307fe5acef0 , woman jacket
83618338-70a0-4512-c763-0307fe5acef0 , woman jacket
83618338-70a0-4512-c763-0307fe5acef0 , man jacket
83618338-70a0-4512-c763-0307fe5acef0 , man jacket
From this I found the output like below:
dvd platers > dvd players
ipod > ipad
bookmy show > book my show
leggings > jeggings
woman jacket > man jacket
minoxidil > minoxidil 5
printed backcase for xperia l > printed backcase for xperia zr
8 gb pen drive > 16 gb pen drive
The main intention is to find all the particular user's given query, and store in a list. From that I need to find out the edit distance of all query. If the edit distance is less than 2 then I need to print that. My code is working good to find that but it should not check for any digits change, it only have to check for words. For example, if a user types "8 gb pen drive" and after some time the user changes its mind and types "16 gb pen drive" I don't want to print that.
Below is my code:
def min_edit_dist(s1, s2):
m=len(s1)+1
n=len(s2)+1
tbl = {}
for i in range(m): tbl[i,0]=i
for j in range(n): tbl[0,j]=j
for i in range(1, m):
for j in range(1, n):
cost = 0 if s1[i-1] == s2[j-1] else 1
tbl[i,j] = min(tbl[i, j-1]+1, tbl[i-1, j]+1, tbl[i-1, j-1]+cost)
return tbl[i,j]
with open("op.txt") as text:
d = {}
for line in text:
line = line.strip("\n")
for lines in line.split("\n"):
try:
key, val = lines.split(",")
d.setdefault(key,[]).append(val.lower())
except:
pass
values = d.values()
keys = d.keys()
for v in values:
for i in range(0,len(v)-1):
if v[i]!= v[i+1]:
if min_edit_dist(v[i], v[i+1]) <= 2:
print v[i]+" > "+v[i+1]
I just need output like below:
dvd platers > dvd players
ipod > ipad
bookmy show > book my show
leggings > jeggings
woman jacket > man jacket
printed backcase for xperia l > printed backcase for xperia zr
Upvotes: 0
Views: 112
Reputation: 8572
You need to filter the value of val
at
key, val = lines.split(",")
d.setdefault(key,[]).append(val.lower())
To filter digits out of string, try
key, val = lines.split(",")
val = ''.join(letter for letter in val if not letter.isdigit()) # filter out digit chars
d.setdefault(key,[]).append(val.lower())
This will make perform list comprehension for every val
string extracted and join all filtered characters. Not a very efficient solution, but should fit your needs.
Upvotes: 1