s_m
s_m

Reputation: 83

ignore of digits(numbers) from a string

I have a input file like below:

op.txt

          user id                        query
4d67373f-ca45-4137-efd0-0da69c78123d , bookmy show
4d67373f-ca45-4137-efd0-0da69c78123d , book my show
4d67373f-ca45-4137-efd0-0da69c78123d , book my show
4d67373f-ca45-4137-efd0-0da69c78123d , book my show
7fda21a5-c432-4d95-f93d-6275b68bb396 , 8 gb pen drive
7fda21a5-c432-4d95-f93d-6275b68bb396 , 16 gb pen drive
dba91160-dec4-454c-f34a-c29d6d95c459 , DVD PLATERS
dba91160-dec4-454c-f34a-c29d6d95c459 , DVD PLAYERS
dba91160-dec4-454c-f34a-c29d6d95c459 , DVD PLAYERS
dba91160-dec4-454c-f34a-c29d6d95c459 , IPOD
dba91160-dec4-454c-f34a-c29d6d95c459 , IPOD
dba91160-dec4-454c-f34a-c29d6d95c459 , IPOD
dba91160-dec4-454c-f34a-c29d6d95c459 , IPAD
d900ec5f-bd71-4e2b-84d0-6a2105050923 , minoxidil
d900ec5f-bd71-4e2b-84d0-6a2105050923 , minoxidil 5
775f1159-e310-42b6-d3b0-5ea3fb959568 , printed backcase for xperia L
775f1159-e310-42b6-d3b0-5ea3fb959568 , printed backcase for xperia zr
775f1159-e310-42b6-d3b0-5ea3fb959568 , printed backcase for xperia zr
9b98a9be-bb63-4310-87d5-592a66ae602a , leggings
9b98a9be-bb63-4310-87d5-592a66ae602a , leggings
9b98a9be-bb63-4310-87d5-592a66ae602a , jeggings
83618338-70a0-4512-c763-0307fe5acef0 , woman jacket
83618338-70a0-4512-c763-0307fe5acef0 , woman jacket
83618338-70a0-4512-c763-0307fe5acef0 , man jacket
83618338-70a0-4512-c763-0307fe5acef0 , man jacket

From this I found the output like below:

dvd platers >  dvd players
ipod >  ipad
bookmy show >  book my show
leggings >  jeggings
woman jacket >  man jacket
minoxidil >  minoxidil 5
printed backcase for xperia l >  printed backcase for xperia zr
8 gb pen drive >  16 gb pen drive

The main intention is to find all the particular user's given query, and store in a list. From that I need to find out the edit distance of all query. If the edit distance is less than 2 then I need to print that. My code is working good to find that but it should not check for any digits change, it only have to check for words. For example, if a user types "8 gb pen drive" and after some time the user changes its mind and types "16 gb pen drive" I don't want to print that.

Below is my code:

 def min_edit_dist(s1, s2):
    m=len(s1)+1
    n=len(s2)+1
    tbl = {}
    for i in range(m): tbl[i,0]=i
    for j in range(n): tbl[0,j]=j
    for i in range(1, m):
        for j in range(1, n):
            cost = 0 if s1[i-1] == s2[j-1] else 1
            tbl[i,j] = min(tbl[i, j-1]+1, tbl[i-1, j]+1, tbl[i-1, j-1]+cost)
    return tbl[i,j]
    with open("op.txt") as text:
       d = {}
       for line in text:
          line = line.strip("\n")
          for lines in line.split("\n"):
            try:
                key, val = lines.split(",")
                d.setdefault(key,[]).append(val.lower())
            except:
                pass
    values = d.values()
    keys = d.keys()
    for v in values:
        for i in range(0,len(v)-1):
           if v[i]!= v[i+1]:
              if min_edit_dist(v[i], v[i+1]) <= 2:
                  print v[i]+" > "+v[i+1]

I just need output like below:

dvd platers >  dvd players
ipod >  ipad
bookmy show >  book my show
leggings >  jeggings
woman jacket >  man jacket
printed backcase for xperia l >  printed backcase for xperia zr

Upvotes: 0

Views: 112

Answers (1)

Slam
Slam

Reputation: 8572

You need to filter the value of val at

key, val = lines.split(",")
d.setdefault(key,[]).append(val.lower())

To filter digits out of string, try

key, val = lines.split(",")
val = ''.join(letter for letter in val if not letter.isdigit())  # filter out digit chars
d.setdefault(key,[]).append(val.lower())

This will make perform list comprehension for every val string extracted and join all filtered characters. Not a very efficient solution, but should fit your needs.

Upvotes: 1

Related Questions