Reputation: 37
Well I previously asked a question and I got the answer what I wanted. However I have more questions now.
I have a list that goes like this:
name = ['road', 'roadwork', 'pill', 'pillbox', 'pillow', 'ball',
'football', 'basketball', 'work', 'box', 'foot', 'basket']
The code below separates the words with compound nouns from the base words:
for candidate in name:
for word in name:
if word != candidate and word in candidate:
break
else:
print candidate
However I realise that the code is too restrictive because it also removes "pillow" from the list.
Is there a code that can generate the below outcome:
name = ['road', 'pill', 'pillow', 'ball', 'work', 'box', 'foot', 'basket']
Upvotes: 1
Views: 356
Reputation: 76234
For your average word, the simplest way to determine if it is a compound word is to chop it in half and see if both halves are words. You have to test repeatedly with different chopping points, so the run time is proportional to the length of the word. It should be reasonably fast for any English word, other than 189,000 character long chemical names.
words = ['road', 'roadwork', 'pill', 'pillbox', 'pillow', 'ball', 'football', 'basketball', 'work', 'box', 'foot', 'basket']
wordSet = set(words)
def isWord(w):
return w in wordSet
def isCompoundWord(word):
for idx in range(1, len(word)):
left = word[:idx]
right = word[idx:]
if isWord(left) and isWord(right):
return True
return False
nonCompoundWords = [word for word in words if not isCompoundWord(word)]
print nonCompoundWords
output:
['road', 'pill', 'pillow', 'ball', 'work', 'box', 'foot', 'basket']
Upvotes: 1
Reputation: 714
You will need to find if what remains of the word after subtracting the match is another word. There will be situations, I imagine where the etymology won't match up. I'm thinking words that include another word plus 'is' where 'is' is not used as it's meaning, for example.
Edit: for example:
words = ['book','store','bookstore','booking']
li = []
for word in words:
for test in words:
if test in word:
temp = word[len(test):]
if temp in words and word not in li:
li.append(word)
for x in li:
words.remove(x)
print words
Upvotes: 0