Reputation: 31
I have a list of words.
trails = ("Fire trail", "Firetrail", "Fire Trail", "FT", "firetrail")
I need to split another string based on any of these words.
So, say, if the names to check are:
I want to modify them to look like this:
Split before one of the word from trail list and only copy the part before.
Thanks!
I should add, my code starts with:
for f in arcpy.da.SearchCursor("firetrail_O_noD_Layer", "FireTrailName", None, None):
... if any(var in str(f[0]) for var in trail):
... new_field = *that part of string without any fire trails and anything after it*
str(f[0]) is referring to the names from the first list new_field is refereing to the names I have in my second list, which I need to create
Upvotes: 2
Views: 11279
Reputation: 44092
As it seems, the requirements and solution shall be clarified and tested iteratively, I provide here
proposed solution incl. test suite to be used with pytest
.
First, create test_trails.py
file:
import pytest
def fix_trails(trails):
"""Clean up list of trails to make sure, longest phrases are processed
with highest priority (are sooner in the list).
This is needed, if some trail phrases contain other ones.
"""
trails.sort(key=len, reverse=True)
return trails
@pytest.fixture
def trails():
phrases = ["Fire trail", "Firetrail", "Fire Trail",
"FT", "firetrail", "Trail", "Fire Trails"]
return fix_trails(phrases)
def remove_trails(line, trails):
for trail in trails:
if trail in line:
res = line.replace(trail, "").strip()
return res.replace(" ", " ")
return line
scenarios = [
["Poverty Point FT", "Poverty Point"],
["Cedar Party Fire Trails", "Cedar Party Fire"],
["Mailbox Trail", "Mailbox"],
["Carpet Snake Creek Firetrail", "Carpet Snake Creek"],
["Pretty Gully firetrail - Roayl NP", "Pretty Gully - Roayl NP"],
]
@pytest.mark.parametrize("scenario", scenarios, ids=lambda itm: itm[0])
def test(scenario, trails):
line, expected = scenario
result = remove_trails(line, trails)
assert result == expected
The file defines the function removing not needed text from processed lines as well as it contains
test case test_trails
.
To test it, install pytest
:
$ pip install pytest
Then run the test:
$ py.test -sv test_trails.py
========================================= test session starts ==================================
=======
platform linux2 -- Python 2.7.9, pytest-2.8.7, py-1.4.31, pluggy-0.3.1 -- /home/javl/.virtualenvs/stack
/bin/python2
cachedir: .cache
rootdir: /home/javl/sandbox/stack, inifile:
collected 5 items
test_trails.py::test[Poverty Point FT] PASSED
test_trails.py::test[Cedar Party Fire Trails] FAILED
test_trails.py::test[Mailbox Trail] PASSED
test_trails.py::test[Carpet Snake Creek Firetrail] PASSED
test_trails.py::test[Pretty Gully firetrail - Roayl NP] PASSED
================ FAILURES ==================
______ test[Cedar Party Fire Trails] _______
scenario = ['Cedar Party Fire Trails', 'Cedar Party Fire']
trails = ['Fire Trails', 'Fire trail', 'Fire Trail', 'Firetrail', 'firetrail', 'Trail', ...]
@pytest.mark.parametrize("scenario", scenarios, ids=lambda itm: itm[0])
def test(scenario, trails):
line, expected = scenario
result = remove_trails(line, trails)
> assert result == expected
E assert 'Cedar Party' == 'Cedar Party Fire'
E - Cedar Party
E + Cedar Party Fire
E ? +++++
test_trails.py:42: AssertionError
======== 1 failed, 4 passed in 0.01 seconds ============
The py.test
command discovers in the file the test case, finds input arguments, uses injection to
put into it the value of trails
and parametrization of the test case provides the scenario
parameter.
You may then fine tune the function remove_trails
and list of trails
untill all passes.
When you are finished, you may move the remove_trails
function where you need (probably incl.
trails
list).
You may use this approach to test whatever of solutin proposed to your question.
Upvotes: 1
Reputation: 26901
I believe that's what you're looking for. You may also add the flag re.IGNORECASE
like so res = re.split(regex, s, re.IGNORECASE)
if you wish for it to be case insensitive. See re.split()
for further documentation.
import re
trails = ("Fire trail", "Firetrail", "Fire Trail", "FT", "firetrail")
# \b means word boundaries.
regex = r"\b(?:{})\b".format("|".join(trails))
s = """Poverty Point FT
Cedar Party Fire Trails
Mailbox Trail
Carpet Snake Creek Firetrail
Pretty Gully firetrail - Roayl NP"""
res = re.split(regex, s)
UPDATE:
In case you go line by line, and don't want the end you can do this:
import re
trails = ("Fire trail", "Firetrail", "Fire Trail", "FT", "firetrail", "Trail", "Trails")
# \b means word boundaries.
regex = r"\b(?:{}).*".format("|".join(trails))
s = """Poverty Point FT
Cedar Party Fire Trails
Mailbox Trail
Carpet Snake Creek Firetrail
Pretty Gully firetrail - Roayl NP"""
res = [r.strip() for r in re.split(regex, s)]
Upvotes: 3
Reputation: 8978
Well, here is more dynamic way to perform task
import re
courses = r"""
Poverty Point FT
Cedar Party Fire Trails
Mailbox Trail
Carpet Snake Creek Firetrail
Pretty Gully firetrail - Roayl NP
"""
trails = ("Fire trail", "Firetrail", "Fire Trail", "FT", "firetrail")
rx_str = '|'.join(trails)
rx_str = r"^.+?(?=(?:{0}|$))".format(rx_str)
rx = re.compile(rx_str, re.IGNORECASE | re.MULTILINE)
for course in rx.finditer(courses):
print(course.group())
As you can notice, I'm converting list into regex dynamically, without hardcoding. Script will render following result:
Poverty Point
Cedar Party
Mailbox Trail
Carpet Snake Creek
Pretty Gully
Upvotes: 1
Reputation: 4837
you can use re.split
here:
import re
_list = re.split(r'Fire trail|Firetrail|Fire Trail|FT|firetrail', _string)
Upvotes: 1
Reputation: 23176
You could do this using a regular expression, for example:
def make_matcher(trails):
import re
rgx = re.compile(r"{}".format("|".join(trails)))
return lambda txt: rgx.split(txt)[0]
>>> m = make_matcher(["Fire trail", "Firetrail", "Fire Trail", "FT", "firetrail"])
>>> examples = ["Poverty Point FT", "Cedar Party Fire Trails", "Mailbox Trail", "Carpet Snake Creek Firetrail", "Pretty Gully firetrail - Roayl NP"]
>>> for x in examples:
... print(m(x))
Poverty Point
Cedar Party
Mailbox Trail
Carpet Snake Creek
Pretty Gully
Note that the in this example the trailing space before the occurrence of eg Firetrail
are maintained. That might not be what you want.
Upvotes: 0