Shwtm3
Shwtm3

Reputation: 43

Python create acronym from first characters of each word and include the numbers

I have a string as follows:
theatre = 'Regal Crown Center Stadium 14'

I would like to break this into an acronym based on the first letter in each word but also include both numbers:
desired output = 'RCCS14'

My code attempts below:
acronym = "".join(word[0] for word in theatre.lower().split()) acronym = "".join(word[0].lower() for word in re.findall("(\w+)", theatre)) acronym = "".join(word[0].lower() for word in re.findall("(\w+ | \d{1,2})", theatre)) acronym = re.search(r"\b(\w+ | \d{1,2})", theatre)

In which I wind up with something like: rccs1 but can't seem to capture that last number. There could be instances when the number is in the middle of the name as well: 'Regal Crown Center 14 Stadium' as well. TIA!

Upvotes: 3

Views: 2579

Answers (4)

ctwheels
ctwheels

Reputation: 22837

See regex in use here

(?:(?<=\s)|^)(?:[a-z]|\d+)
  • (?:(?<=\s)|^) Ensure what precedes is either a space or the start of the line
  • (?:[a-z]|\d+) Match either a single letter or one or more digits

The i flag (re.I in python) allows [a-z] to match its uppercase variants.

See code in use here

import re

r = re.compile(r"(?:(?<=\s)|^)(?:[a-z]|\d+)", re.I)
s = 'Regal Crown Center Stadium 14'

print(''.join(r.findall(s)))

The code above finds all instances where the regex matches and joins the list items into a single string.

Result: RCCS14

Upvotes: 2

Amaro Vita
Amaro Vita

Reputation: 436

import re
theatre = 'Regal Crown Center Stadium 14'
r = re.findall("\s(\d+|\S)", ' '+theatre)
print(''.join(r))

Gives me RCCS14

Upvotes: 0

gcharbon
gcharbon

Reputation: 1701

I can't comment since I don't have enough reputation, but S. Jovan answer isn't satisfying since it assumes that each word starts with a capital letter and that each word has one and only one capital letter.

re.sub(r'[a-z ]+', '', "Regal Crown Center Stadium YB FIEUBFB DBUUFG FUEH  14")

will returns 'RCCSYBFIEUBFBDBUUFGFUEH14'

However ctwheels answers will be able to work in this case :

r = re.compile(r"\b(?:[a-z]|\d+)", re.I)
s = 'Regal Crown Center Stadium YB FIEUBFB DBUUFG FUEH  14'

print(''.join(r.findall(s)))

will print

RCCSYFDF14

Upvotes: 0

Srdjan M.
Srdjan M.

Reputation: 3405

You can use re.sub() to remove all lowercase letters and spaces.

Regex: [a-z ]+

Details:

  • []+ Match a single character present in the list between one and unlimited times

Python code:

re.sub(r'[a-z ]+', '', theatre)

Output: RCCS14

Code demo

Upvotes: 1

Related Questions