Reputation: 129

How to skip the words not in the dictionary

I have a file as a dictionary word:

water=45 
melon=8 
apple=35 
pineapple=67 
I=43 
to=90 
eat=12 
tastes=100 
sweet=21 
it=80 
watermelon=98 
want=70
juice=88

and I have another file with the following text:

I want to eat banana and watermelon 
I want drink juice purple and pineapple

I want to output :

43, 70, 90, 12, 98
43, 70, 88, 67

every word that does not exist in the dictionary are in skip.

This is what I have so far:

import re
f = open(r'C:\Users\dinesh_pundkar\Desktop\val.txt','r')
val_dict = {}
for line in f:
     k, v = line.strip().split('=')
     val_dict[k.strip()] = v.strip()
f.close()


h = open(r'C:\Users\dinesh_pundkar\Desktop\str_txt.txt','r')
str_list = []
for line in h:
     str_list.append(str(line).strip())



tmp_str = ''
for val in str_list:
    tmp_str = val 
    for k in val_dict.keys():
            if k in val:
                replace_str = str(val_dict[k]).strip() + ","
                tmp_str= re.sub(r'\b{0}\b'.format(k),replace_str,tmp_str,flags=re.IGNORECASE)

    tmp_str = tmp_str.strip(",")
    print val, " = ", tmp_str
    tmp_str = ''

Output :

43, 70, 90, 12, banana and 98
43, 70, drink 88, purple and 67

Upvotes: 0

Answers (3)

Harrison Grodin

Reputation: 2323

First, we can parse your "dictionary file" into an actual Python dictionary using a clever dict comprehension.

In [1]: dict_file = """water=45 
   ...: melon=8 
   ...: apple=35 
   ...: pineapple=67 
   ...: I=43 
   ...: to=90 
   ...: eat=12 
   ...: tastes=100 
   ...: sweet=21 
   ...: it=80 
   ...: watermelon=98 
   ...: want=70
   ...: juice=88"""

In [2]: conversion = {k: int(v) for line in dict_file.split('\n') for (k,v) in (line.split('='),)}

In [3]: conversion
Out[3]: 
{'I': 43,
 'apple': 35,
 'eat': 12,
 'it': 80,
 'juice': 88,
 'melon': 8,
 'pineapple': 67,
 'sweet': 21,
 'tastes': 100,
 'to': 90,
 'want': 70,
 'water': 45,
 'watermelon': 98}

We then set the phrase to a variable.

In [4]: text = "I want to eat banana and watermelon"

We can use str.split to change the single string into a list of the words.

In [5]: text.split()
Out[5]: ['I', 'want', 'to', 'eat', 'banana', 'and', 'watermelon']

To check if each word is in the conversion dictionary, we can simply use the in keyword, which checks dictionary keys.

In [6]: "banana" in conversion
Out[6]: False

In [7]: "watermelon" in conversion
Out[7]: True

We can implement this in a list comprehension to filter only the words that our conversion dictionary knows how to convert to a number. We can also look for the value conversion[word], which we know exists because we already confirmed that the comprehension only looks at values which are in the conversion dict.

In [9]: [str(conversion[word]) for word in text.split() if word in conversion]
Out[9]: ['43', '70', '90', '12', '98']

Finally, we can use str.join to combine this list back into a single string. (The square brackets are removed, which makes the expression a generator comprehension, not a list comprehension, but it works either way.)

In [10]: ', '.join(str(conversion[word]) for word in text.split() if word in conversion)
Out[10]: '43, 70, 90, 12, 98'

Success! You can apply this method to any of the phrases in your file via a simple for loop to get the desired result.

There isn't much of a need for regex here; Python's string processing features are very powerful. :)

Upvotes: 0

Adam Smith

Reputation: 54223

You can use dict.get which allows for a default value if you don't find the key.

>>> d = {'a': 1, 'b': 2}
>>> d['c']
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'c'

>>> d.get('c', 'fallback value')
'fallback value'

This will let you do something like:

nums = [dict.get(val, '') for val in str_list]
# [43, 70, 90, 12, '', '', 98]

Then remove the empty strings with filter

nums = filter(None, nums)
# with `None` as the first argument, this removes all elements that eval to False

Then map to string and join with commas

print(", ".join(map(str, nums)))

Upvotes: 1

Chiheb Nexus

Reputation: 9267

You can do something like this using list comprehension in order to have your desired output:

I'm assuming your dictionary file is called file1 and your second file is called file2.

data1 = [k.rstrip().split("=") for k in open("file1", 'r')]
data2 = [k.rstrip().split() for k in open("file2", 'r')]

for k in data2:
    for j in k:
        for m in data1:
            if j == m[0]:
                print(m[1], end = ' ')
    print()

Output:

43 70 90 12 98 
43 70 88 67

Upvotes: 0

How to skip the words not in the dictionary

Answers (3)

Related Questions