Reputation: 21
I have a list of strings in capital letters, and I want to change some of the characters to lower case since they are measure units or abbreviations.
I am new to regular expressions, but I managed to get two regex that fit what I need: one for the 'X' placed within numbers, another for the rest of the cases needed.
The problem is that it changes the whole pattern (as you can see "--") and what I want is that the letters in regex rules become lowercase.
t1 = 'EXTRUDED PROFILE 50 X 50 MM'
t2 = 'MATERIAL TYPE 3XP WITH A DENSITY OF D= 50KG/M3 AND THICKNESS OF E=8MM'
t3 = 'STEEL TUBE 50X50X3 MM'
list_Txt = [t1, t2, t3]
pattern_X = r'(\d\s?X\s?\d)'
pattern_M = r'(E=|D=)?\s?\d+\s?(X|MM|KG/M)|d+\.(MM)'
new_Txt= [re.sub(pattern_X,'--', item) for item in list_Txt]
Returns:
'EXTRUDED PROFILE 5--0 MM', 'MATERIAL TYPE 3XP WITH A DENSITY OF D= 50KG/M3 AND THICKNESS OF E=8MM', 'STEEL TUBE 5---- MM'
I want:
'EXTRUDED PROFILE 50 x 50 mm', 'MATERIAL TYPE 3XP WITH A DENSITY OF d= 50kg/m3 AND THICKNESS OF e=8mm', 'STEEL TUBE 50x50x3 mm'
Upvotes: 2
Views: 70
Reputation: 785551
You may use this python solution with a lambda to lowercase matched text:
import re
t1 = 'EXTRUDED PROFILE 50 X 50 MM'
t2 = 'MATERIAL TYPE 3XP WITH A DENSITY OF D= 50KG/M3 AND THICKNESS OF E=8MM'
t3 = 'STEEL TUBE 50X50X3 MM'
list_Txt = [t1, t2, t3]
pat = re.compile(r'(?:[ED]=\s*)?(?:\d+\s*X\s*)*\d+\s*(?:M[MG]|KG/M)')
new_Txt= [pat.sub(lambda m: m.group().lower(), item) for item in list_Txt]
print (new_Txt)
Output:
['EXTRUDED PROFILE 50 x 50 mm', 'MATERIAL TYPE 3XP WITH A DENSITY OF d= 50kg/m3 AND THICKNESS OF e=8mm', 'STEEL TUBE 50x50x3 mm']
RegEx Details:
(?:[ED]=\s*)?
: Optionally match E=
or D=
followed by 0 or more whitespaces(?:\d+\s*X\s*)*
: Match 1+ digits followed by 0+ spaces followed by X
. Repeat this group 0 or more times\d+
: Match 1+ digits\s*
: Match 0 or more whitespaces(?:M[MG]|KG/M)
: Match MM
or MG
or KG/M
Upvotes: 3