broncaeux
broncaeux

Reputation: 9

Extract abbreviation from string of the words longer than 3 letters by regex

string1 =  'Department of the Federal Treasury "IFTS No. 43"'
string2 =  'Federal Treasury Company "Light-8"'

I need to get the first capital letters of words longer than 3 characters that are before the opening quote, and also extract the quoted expression using a common pattern for 2 strings.

Final string should be:

I would like to get a common pattern for two lines for further use of this expression in DataFrame.

Upvotes: -2

Views: 83

Answers (1)

bobble bubble
bobble bubble

Reputation: 18545

You can use a capturing group and alternation.

"([^"]+)"|\b[A-Z]

See this demo at regex101 (FYI read: The Trick)

It either matches the quoted parts and captures negated double quotes "inside" to the first capturing group OR matches each capital letter at an initial \b word boundary (start of word).

import re

regex = r"\"([^\"]+)\"|\b[A-Z]"

s = "Department of the Federal Treasury \"IFTS No. 43\"\n"

res = ["", ""]

for m in re.finditer(regex, s):
  if(m.group(1)):
    res[0] += m.group(1)
  else:
    res[1] += m.group(0)

print(res)

Python demo at tio.run >

['IFTS No. 43', 'DFT']

Upvotes: -1

Related Questions