Reputation: 954
I have a problem. I am trying to find device names in a string. All the device names that I am looking for are stored in a List. There is one thing very important in what I want:
Now the problem I have is this:
I have two devices (Fan
and Fan Light
). When I give the command: Turn on Fan Light
both devices have been found, but I want only Fan Light
to be found. I tried checking all the devices that have been found and set the longest one as found device like this:
# Create 2 dummy devices
device1 = {
"name": "fan"
}
device2 = {
"name": "fan light"
}
# Add devices to list
devices = []
devices.append(device1)
devices.append(device2)
# Given command
command = "Turn on fan light"
foundDevices = []
# Search devices in sentence
for device in devices:
# Splits a device name if it has multiple words
deviceSplit = device["name"].split()
numOfSubNames = len(deviceSplit)
# Checks for every sub-name if it is found in the string
i = 0
for subName in deviceSplit:
if subName in command:
i += 1
# Checks if all names where located in string
if i == numOfSubNames:
foundDevices.append(device["name"])
# Checks if multiple devices have been found
if len(foundDevices) >= 2:
largestNameLength = 0
# Checks which device has the largest name
for device in foundDevices:
if (len(device) > largestNameLength):
largestName = device
largestNameLength = len(device)
# Clears list and only add longest one
foundDevices.clear()
foundDevices.append(largestName)
print(foundDevices)
But that gives a problem when I say for example: "Turn on Fan Light and the Fan", because that command does contain multiple devices. How can I scan for devices the way I want?
Upvotes: 1
Views: 78
Reputation: 5301
You can use the python regex
module instead of the re
module (to improve upon RichieV's nice answer), if you don't want to rely on sorting the list of devices to ensure the correct result.
The problem with re
is, is that it is not POSIX compliant and thus, the pipe operator |
will not ensure that the longest leftmost match is returned (see also How to order regular expression alternatives to get longest match?).
However, in regex
you can specify (?p)
before a regex pattern to ensure POSIX matching.
Altogether
import regex
devices = [{'name': 'fan'}, {'name': 'fan light'}]
test_cases = [
'Turn on fan light',
'Turn on fan light and fan',
'Turn on fan and fan light',
'Turn on fan and fan',
]
transformed = {dev: name for x in devices for name, dev in x.items()}
pattern = '|'.join(transformed)
for command in test_cases:
matches = regex.findall(r'(?p)'+pattern,command)
print(matches)
will give you
['fan light']
['fan light', 'fan']
['fan', 'fan light']
['fan', 'fan']
regardless of the order of the dictionaries in devices
.
Upvotes: 1
Reputation: 5183
A regular expression search is one way of quickly doing what you want, with a pattern made from the different device names.
import re
def find_with_regex(command, pattern):
return list(set(re.findall(pattern, command, re.IGNORECASE)))
I would also suggest building the reversed dictionary of device: name
shape, maybe it would help quickly finding the code name of a given device.
devices = [{'name': 'fan light'}, {'name': 'fan'}]
# build a quick-reference dict with device>name structure
transformed = {dev: name for x in devices for name, dev in x.items()}
# should also help weeding out duplicated devices
# as it would raise an error as soon as it fids one
# print(transformed)
# {'fan light': 'name', 'fan': 'name'}
Special thanks to buddemat for pointing out that device names to be in a particular order for this solution to work, fixed it with reversed(sorted(...
on the pattern making line from the next code block.
Testing the function
test_cases = [
'Turn on fan light',
'Turn on fan light and fan',
'Turn on fan and fan light',
'Turn on fan and fan',
]
pattern = '|'.join(reversed(sorted(transformed)))
for command in test_cases:
matches = find_with_regex(command, pattern)
print(matches)
Output
['fan light']
['fan', 'fan light']
['fan', 'fan light']
['fan']
Upvotes: 1