Sam
Sam

Reputation: 641

Regex python2.7

I have the following list:

['L INE', 'LI NE', 'LIN E', 'L I NE', 'L I NE', 'L I N E']

I would like to use regex to replace every instance of the above list with 'LINE'. I used the follow expression re.sub('^L\s+[A-Z]E$'|'^L\s+[A-Z]\s+E$', 'LINE') but I'm getting incorrect results.

I'm hoping any good soul can give me a nice expression that can tackle all the cases above, and also point me to a good and simple regex source that I can follow to learn more about it as I'm very new to using it. Many thanks in advance.

Upvotes: 0

Views: 108

Answers (3)

perseverance
perseverance

Reputation: 73

lst=['L INE', 'LI NE', 'LIN E', 'L I NE', 'L I NE', 'L I N E']
#loop through each item in list
for i in range(len(lst)):
  #\s means 0 or more whitespaces
  lst[i]=re.sub('^L\s*I\s*N\s*E$','LINE',lst[i]) 
print lst

Upvotes: 1

camposquinn
camposquinn

Reputation: 126

Your regex is looking for too much and/or too little stuff: ^L\s+[A-Z]E$ would match "L [any single capital letter]E", like "L XE" for example. But \s+ requires one or more whitespace characters so it would skip "LXE."

Since the whitespace could appear between any character, you could make a regex that searches for zero or more whitespace characters (\s) between every character you know you need to match. So:

^(l|L)\s*(i|I)\s*(n|N)\s*(e|E)\s*$

would match the items in your list.

A simpler and more understandable approach would be to use replace() on all the strings. This should be faster than compiling and matching on a regex.

If you know they will all be upper case for instance:

myList = ['L INE', 'LI NE', 'LIN E', 'L I NE', 'L I NE', 'L I N E']
# this iterates over your original list and makes a new list that 
# is composed of just the items with whitespace removed
cleanedList = [item.replace(" ","") for item in myList]
# print it and see!
print cleanedList

You can get more complex too if there are other patterns you need to escape, or if you need to set conditions on when you maybe don't want to remove whitespace.

I really recommend diving into regexes since they are super helpful, but in Python there's often a simpler way to do it! Try searching for online regex testers for one of the many interactive regex tools. They are super helpful. Here's a good one: https://regex101.com/

Upvotes: 0

Yu Jiaao
Yu Jiaao

Reputation: 4714

 import re
 a=['L INE', 'LI NE', 'LIN E', 'L I NE', 'L I NE', 'L I N E']
 for b in a: 
    print(re.sub('L\\s*I\\s*N\\s*E', 'LINE', b));

Upvotes: 0

Related Questions