ssssmaner
ssssmaner

Reputation: 25

Python:Find all words ending in text (re.findall)

Load macOS.txt into a variable text. Then do the following: Find all the occurrences of macOS, Mac OS, and OS X in the text. Put the results in one list. Print the list of those words then print the following: There are {length of list} words mentioning macOS, Mac OS, or OS X in the text.

I think I should use REGULAR EXPRESSION.Like re.findall or re.finditer. Anyone can correct my codes below?

text = open("macOS.txt", "r")  
import re
pattern = '[A-Za-z0-9-]+' 
lines = "OS"  
ls = re.findall(pattern,lines)
print(ls)

But how to Find all the occurrences of macOS, Mac OS, and OS X in the text?

or this?

import re
with open('macOS.txt', 'r') as f:
  content = f.read()
temp = re.findall(\b(?!\w*OS\b)\w*OS\b)
print(f'There are {len(temp)} words ended with OS (other than OS and macOS) in the text.')

Upvotes: 0

Views: 915

Answers (2)

Ryszard Czech
Ryszard Czech

Reputation: 18631

Use

re.findall(r'\b(?:(?:Mac |mac)OS|OS X)\b', s)

See proof.

EXPLANATION

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    (?:                      group, but do not capture:
--------------------------------------------------------------------------------
      Mac                      'Mac '
--------------------------------------------------------------------------------
     |                        OR
--------------------------------------------------------------------------------
      mac                      'mac'
--------------------------------------------------------------------------------
    )                        end of grouping
--------------------------------------------------------------------------------
    OS                       'OS'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    OS X                     'OS X'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char

Upvotes: 1

kiara
kiara

Reputation: 11

You can use fuzzywuzzy library. Take few letters before and after finding 'OS", use the fuzzywuzzy library to compare. https://www.geeksforgeeks.org/fuzzywuzzy-python-library/

Alternatively, if your output is limited to one word before and after 'OS', then you can just do this-

  1. if that word contains OS (macOS)
  2. find one word prior to OS => see if its 'Mac' => concat them
  3. find one word after OS => see if its 'X' => concat them

Upvotes: 1

Related Questions