Stephen
Stephen

Reputation: 23

How to remove parentheses and all data within using Python3

I'm trying to remove parenthesis and all data within using Python 3.

I've looked into several different threads, including here:

How to remove parentheses and all data within using Pandas/Python?

After finally getting:

re.sub(r"\(.*\)|\s-\s.*", r"", str1)

to run without errors, it didn't remove the content from the str1 string.

Then I tried this approach:

How to remove text within parentheses from Python string?

to remove the parenthesis and contents from the file before reading it in and storing to str1 - but I get this error:

Traceback (most recent call last):

  File "sum_all.py", line 27, in <module>
    data.append(line.replace(match.group(),'').strip())
AttributeError: 'NoneType' object has no attribute 'group'

Here is the code, I'm obviously new at this and appreciate any help!!

# Python3 program to calculate sum of 
# all numbers present in a str1ing 
# containing alphanumeric characters 

# Function to calculate sum of all 
# numbers present in a str1ing 
# containing alphanumeric characters 
import re
import math
import pyperclip
import pandas
def find_sum(str1): 
    # Regular Expression that matches digits in between a string 
    return sum(map(int,re.findall('\d+',str1))) 

def find_sum2(str2): 
    # Regular Expression that matches digits where hr follows short for hours 
    return sum(map(int,re.findall('(\d+)hr',str1)))

str2=0

# Regular Expression 
data=[]
pattern=r'\(.+\)|\s\-.+'
with open('project.txt','r') as f:
    for line in f:
        match=re.search(pattern,line)
        data.append(line.replace(match.group(),'').strip())

print(data)

# input alphanumeric str1ing 
with open ("project.txt", "r") as myfile:
    str1=myfile.read().replace('\n', '')


# Regular Expression that removes (*) and Normalizes White Space - didn't work
#re.sub(r"\(.*\)|\s-\s.*", r"", str1)

# Regular Expression that removes (*) - didn't work
#re.sub(r"\(.*\)", r"", str1)

Upvotes: 2

Views: 2476

Answers (1)

Gabriel Avenda&#241;o
Gabriel Avenda&#241;o

Reputation: 208

You can try this. r"\((.*?)\)"

The \( and \) say we want to target the actual parenthesis.

Then the parenthesis around the expression (.*?) say that we want to group what is inside.

Finally the .*? means that we want any character . and any repetition of that character *?.

s = "start (inside) this is in between (inside) end"
res = re.sub(r"\((.*?)\)", "", s)
print(res) 

prints

'start  this is in between  end'

Hope that helps.

Upvotes: 2

Related Questions