AnaG
AnaG

Reputation: 21

How to change a character in a list of strings to lowercase if it matches a regular expression?

I have a list of strings in capital letters, and I want to change some of the characters to lower case since they are measure units or abbreviations.

I am new to regular expressions, but I managed to get two regex that fit what I need: one for the 'X' placed within numbers, another for the rest of the cases needed.

The problem is that it changes the whole pattern (as you can see "--") and what I want is that the letters in regex rules become lowercase.

t1 = 'EXTRUDED PROFILE 50 X 50 MM'
t2 = 'MATERIAL TYPE 3XP WITH A DENSITY OF D= 50KG/M3 AND THICKNESS OF E=8MM'
t3 = 'STEEL TUBE 50X50X3 MM'
list_Txt = [t1, t2, t3]

pattern_X = r'(\d\s?X\s?\d)'
pattern_M = r'(E=|D=)?\s?\d+\s?(X|MM|KG/M)|d+\.(MM)'

new_Txt= [re.sub(pattern_X,'--', item) for item in list_Txt]

Returns:

'EXTRUDED PROFILE 5--0 MM', 'MATERIAL TYPE 3XP WITH A DENSITY OF D= 50KG/M3 AND THICKNESS OF E=8MM', 'STEEL TUBE 5---- MM'

I want:

'EXTRUDED PROFILE 50 x 50 mm', 'MATERIAL TYPE 3XP WITH A DENSITY OF d= 50kg/m3 AND THICKNESS OF e=8mm', 'STEEL TUBE 50x50x3 mm'

Upvotes: 2

Views: 70

Answers (1)

anubhava
anubhava

Reputation: 785551

You may use this python solution with a lambda to lowercase matched text:

import re

t1 = 'EXTRUDED PROFILE 50 X 50 MM'
t2 = 'MATERIAL TYPE 3XP WITH A DENSITY OF D= 50KG/M3 AND THICKNESS OF E=8MM'
t3 = 'STEEL TUBE 50X50X3 MM'
list_Txt = [t1, t2, t3]

pat = re.compile(r'(?:[ED]=\s*)?(?:\d+\s*X\s*)*\d+\s*(?:M[MG]|KG/M)')

new_Txt= [pat.sub(lambda m: m.group().lower(), item) for item in list_Txt]

print (new_Txt)

Output:

['EXTRUDED PROFILE 50 x 50 mm', 'MATERIAL TYPE 3XP WITH A DENSITY OF d= 50kg/m3 AND THICKNESS OF e=8mm', 'STEEL TUBE 50x50x3 mm']

RegEx Demo

RegEx Details:

  • (?:[ED]=\s*)?: Optionally match E= or D= followed by 0 or more whitespaces
  • (?:\d+\s*X\s*)*: Match 1+ digits followed by 0+ spaces followed by X. Repeat this group 0 or more times
  • \d+: Match 1+ digits
  • \s*: Match 0 or more whitespaces
  • (?:M[MG]|KG/M): Match MM or MG or KG/M

Upvotes: 3

Related Questions