Keerthana
Keerthana

Reputation: 157

To extract only specific characters using Regex in python

I need to extract specific characters like brackets (not the elements within it), *, # etc and replace it with ' '. So I compiled my pattern like below

p = re.compile(r'\s([\[]).*|\s([\(]).*|\s([#]).*|\s([\{]).*|\s([\*]).*|\s([\<]).*|\s.*(\>)\s|\s.* 
      (\])\s|\s.*(\))\s|\s.*(#)\s|\s.*(\*)\s|\s.*(\})\s')
string = "hello (you) "
for match in re.finditer(p, string):
  print(match.group())

This gives the output:

(you)

But what I am expecting is match to give the output list with the captured group like below

["(",")"]

so that I can replace it with ' ' and have the desired output as

hello you

Input: Abnormal heart rate (with fever) should be monitored. Insert your <Name> here.
Output:Abnormal heart rate with fever should be monitored. Insert your Name here.

Upvotes: 0

Views: 110

Answers (3)

Shashidhar Reddy
Shashidhar Reddy

Reputation: 195

I think you can proceed with replace all with space expect A-Z a-z if you also want digits 0-9 you can specify.

public class MyClass {
    public static void main(String args[]) {
      String string = "hello (you) hai";
              String result =string.replaceAll("[^A-Z a-z]","");
      System.out.println(result);
    }
}

This will work but here we are using replaceAll();

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 520908

This answer assumes that you want to replace terms in parentheses or angle brackets with only the content inside them. That is:

(with fever) -> with fever
<Name> -> Name

We can try using re.sub here with a callback function:

inp = "Abnormal heart rate (with fever) should be monitored. Insert your <Name> here."
print(re.sub(r'\(.*?\)|<.*?>', lambda x: re.sub(r'[()<>]', '', x.group(0)), inp))

This prints:

Abnormal heart rate with fever should be monitored. Insert your Name here.

The logic here is that we selectively target the (...) and <...> terms using an alternation. Then, we pass the entire match to a lambda callback which then replaces the surrounding symbols with just the content.

Upvotes: 1

Barmar
Barmar

Reputation: 780714

Just list all the characters you want to remove in a single character set, and use re.sub() to remove them.

print(re.sub(r'[[\](){}<>#*]', '', string))

Upvotes: 0

Related Questions