Reputation: 25
Load macOS.txt into a variable text. Then do the following: Find all the occurrences of macOS, Mac OS, and OS X in the text. Put the results in one list. Print the list of those words then print the following: There are {length of list} words mentioning macOS, Mac OS, or OS X in the text.
I think I should use REGULAR EXPRESSION.Like re.findall or re.finditer. Anyone can correct my codes below?
text = open("macOS.txt", "r")
import re
pattern = '[A-Za-z0-9-]+'
lines = "OS"
ls = re.findall(pattern,lines)
print(ls)
But how to Find all the occurrences of macOS, Mac OS, and OS X in the text?
or this?
import re
with open('macOS.txt', 'r') as f:
content = f.read()
temp = re.findall(\b(?!\w*OS\b)\w*OS\b)
print(f'There are {len(temp)} words ended with OS (other than OS and macOS) in the text.')
Upvotes: 0
Views: 915
Reputation: 18631
Use
re.findall(r'\b(?:(?:Mac |mac)OS|OS X)\b', s)
See proof.
EXPLANATION
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
Mac 'Mac '
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
mac 'mac'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
OS 'OS'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
OS X 'OS X'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
Upvotes: 1
Reputation: 11
You can use fuzzywuzzy library. Take few letters before and after finding 'OS", use the fuzzywuzzy library to compare. https://www.geeksforgeeks.org/fuzzywuzzy-python-library/
Alternatively, if your output is limited to one word before and after 'OS', then you can just do this-
Upvotes: 1