Reputation: 59
I have created a function to search for the contexts of a given word(w) in a text, with left and right as parameters for flexibility in the number of words to record.
import re
def get_context (text, w, left, right):
text.insert (0, "*START*")
text.append ("*END*")
all_contexts = []
for i in range(len(text)):
if re.match(w,text[i], 0):
if i < left:
context_left = text[:i]
else:
context_left = text[i-left:i]
if len(text) < (i+right):
context_right = text[i:]
else:
context_right = text[i:(i+right+1)]
context = context_left + context_right
all_contexts.append(context)
return all_contexts
So for example if a have a text in the form of a list like this:
text = ['Python', 'is', 'dynamically', 'typed', 'language', 'Python', 'functions', 'really', 'care', 'about', 'what', 'you', 'pass', 'to', 'them', 'but', 'you', 'got', 'it', 'the', 'wrong', 'way', 'if', 'you', 'want', 'to', 'pass', 'one', 'thousand', 'arguments', 'to', 'your', 'function', 'then', 'you', 'can', 'explicitly', 'define', 'every', 'parameter', 'in', 'your', 'function', 'definition', 'and', 'your', 'function', 'will', 'be', 'automagically', 'able', 'to', 'handle', 'all', 'the', 'arguments', 'you', 'pass', 'to', 'them', 'for', 'you']
The function works fine for example:
get_context(text, "function",2,2)
[['language', 'python', 'functions', 'really', 'care'], ['to', 'your', 'function', 'then', 'you'], ['in', 'your', 'function', 'definition', 'and'], ['and', 'your', 'function', 'will', 'be']]
Now I am trying to build a dictionary with the contexts of every word in the text doing the following:
d = {}
for w in set(text):
d[w] = get_context(text,w,2,2)
But I am getting this error.
Traceback (most recent call last):
File "<pyshell#32>", line 2, in <module>
d[w] = get_context(text,w,2,2)
File "<pyshell#20>", line 9, in get_context
if re.match(w,text[i], 0):
File "/usr/lib/python3.4/re.py", line 160, in match
return _compile(pattern, flags).match(string)
File "/usr/lib/python3.4/re.py", line 294, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.4/sre_compile.py", line 568, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.4/sre_parse.py", line 760, in parse
p = _parse_sub(source, pattern, 0)
File "/usr/lib/python3.4/sre_parse.py", line 370, in _parse_sub
itemsappend(_parse(source, state))
File "/usr/lib/python3.4/sre_parse.py", line 579, in _parse
raise error("nothing to repeat")
sre_constants.error: nothing to repeat
I don't understand this error. Can anyone help me with this?
Upvotes: 0
Views: 1295
Reputation: 3405
The whole thing can be re-written very succinctly follows,
text = 'Python is dynamically typed language Python functions really care about what you pass to them but you got it the wrong way if you want to pass one thousand arguments to your function then you can explicitly define every parameter in your function definition and your function will be automagically able to handle all the arguments you pass to them for you'
Keeping it a str
, assuming context = 'function',
pat = re.compile(r'(\w+\s\w+\s)functions?(?=(\s\w+\s\w+))')
pat.findall(text)
[('language Python ', ' really care'),
('to your ', ' then you'),
('in your ', ' definition and'),
('and your ', ' will be')]
Now, minor customization will be needed in the regex to allow for, words like say, functional
or functioning
not only function
or functions
. But the important idea is to do away with indexing and go more functional.
Please comment if this doesn't work out for you, when you apply it in bulk.
Upvotes: 1
Reputation: 11177
The problem is that "*START*" and "*END*" are being interpreted as regex. Also, note that inserting "*START*" and "*END*" in text
in the begging of the function will cause problem. You should do it just once.
Here is a complete version of the working code:
import re
def get_context(text, w, left, right):
all_contexts = []
for i in range(len(text)):
if re.match(w,text[i], 0):
if i < left:
context_left = text[:i]
else:
context_left = text[i-left:i]
if len(text) < (i+right):
context_right = text[i:]
else:
context_right = text[i:(i+right+1)]
context = context_left + context_right
all_contexts.append(context)
return all_contexts
text = ['Python', 'is', 'dynamically', 'typed', 'language',
'Python', 'functions', 'really', 'care', 'about', 'what',
'you', 'pass', 'to', 'them', 'but', 'you', 'got', 'it', 'the',
'wrong', 'way', 'if', 'you', 'want', 'to', 'pass', 'one',
'thousand', 'arguments', 'to', 'your', 'function', 'then',
'you', 'can', 'explicitly', 'define', 'every', 'parameter',
'in', 'your', 'function', 'definition', 'and', 'your',
'function', 'will', 'be', 'automagically', 'able', 'to', 'handle',
'all', 'the', 'arguments', 'you', 'pass', 'to', 'them', 'for', 'you']
text.insert(0, "START")
text.append("END")
d = {}
for w in set(text):
d[w] = get_context(text,w,2,2)
Maybe you can replace re.match(w,text[i], 0)
with w == text[i]
.
Upvotes: 2
Reputation: 27283
At least one of the elements in text
contains characters that are special in a regular expression. If you're just trying to find whether the word is in the string, just use str.startswith
, i.e.
if text[i].startswith(w): # instead of re.match(w,text[i], 0):
But I don't understand why you are checking for that anyways, and not for equality.
Upvotes: 0