Reputation: 360
Given a single word (x); return the possible n-grams that can be found in that word. You can modify the n-gram value according as you want; it is in the curly braces in the pat variable. The default n-gram value is 4.
For example; for the word (x):
x = 'abcdef'
The possible 4-gram are:
['abcd', 'bcde', 'cdef']
def ngram_finder(x):
pat = r'(?=(\S{4}))'
xx = re.findall(pat, x)
return xx
The Question is: How to combine the f-string with the r-string in the regex expression, using curly braces.
Upvotes: 4
Views: 4646
Reputation: 147206
You can use this string to combine the n
value into your regexp, using double curly brackets to create a single one in the output:
fr'(?=(\S{{{n}}}))'
The regex needs to have {}
to make a quantifier (as you had in your original regex {4}
). However f
strings use {}
to indicate an expression replacement so you need to "escape" the {}
required by the regex in the f string. That is done by using {{
and }}
which in the output create {
and }
. So {{{n}}}
(where n=4
) generates '{' + '4' + '}' = '{4}'
as required.
Complete code:
import re
def ngram_finder(x, n):
pat = fr'(?=(\S{{{n}}}))'
return re.findall(pat, x)
x = 'abcdef'
print(ngram_finder(x, 4))
print(ngram_finder(x, 5))
Output:
['abcd', 'bcde', 'cdef']
['abcde', 'bcdef']
Upvotes: 8