Reputation: 9
string1 = 'Department of the Federal Treasury "IFTS No. 43"'
string2 = 'Federal Treasury Company "Light-8"'
I need to get the first capital letters of words longer than 3 characters that are before the opening quote, and also extract the quoted expression using a common pattern for 2 strings.
Final string should be:
string1
: 'IFTS No. 43, DFT'
.string2
: 'Light-8, FTC'
.I would like to get a common pattern for two lines for further use of this expression in DataFrame.
Upvotes: -2
Views: 83
Reputation: 18545
You can use a capturing group and alternation.
"([^"]+)"|\b[A-Z]
See this demo at regex101 (FYI read: The Trick)
It either matches the quoted parts and captures negated double quotes "
inside"
to the first capturing group OR matches each capital letter at an initial \b
word boundary (start of word).
import re
regex = r"\"([^\"]+)\"|\b[A-Z]"
s = "Department of the Federal Treasury \"IFTS No. 43\"\n"
res = ["", ""]
for m in re.finditer(regex, s):
if(m.group(1)):
res[0] += m.group(1)
else:
res[1] += m.group(0)
print(res)
['IFTS No. 43', 'DFT']
Upvotes: -1