How to remove parentheses and all data within using Python3

Question

I'm trying to remove parenthesis and all data within using Python 3.

I've looked into several different threads, including here:

How to remove parentheses and all data within using Pandas/Python?

After finally getting:

re.sub(r"$.*$|\s-\s.*", r"", str1)

to run without errors, it didn't remove the content from the str1 string.

Then I tried this approach:

How to remove text within parentheses from Python string?

to remove the parenthesis and contents from the file before reading it in and storing to str1 - but I get this error:

Traceback (most recent call last):

  File "sum_all.py", line 27, in 
    data.append(line.replace(match.group(),'').strip())
AttributeError: 'NoneType' object has no attribute 'group'

Here is the code, I'm obviously new at this and appreciate any help!!

# Python3 program to calculate sum of 
# all numbers present in a str1ing 
# containing alphanumeric characters 

# Function to calculate sum of all 
# numbers present in a str1ing 
# containing alphanumeric characters 
import re
import math
import pyperclip
import pandas
def find_sum(str1): 
    # Regular Expression that matches digits in between a string 
    return sum(map(int,re.findall('\d+',str1))) 

def find_sum2(str2): 
    # Regular Expression that matches digits where hr follows short for hours 
    return sum(map(int,re.findall('(\d+)hr',str1)))

str2=0

# Regular Expression 
data=[]
pattern=r'$.+$|\s\-.+'
with open('project.txt','r') as f:
    for line in f:
        match=re.search(pattern,line)
        data.append(line.replace(match.group(),'').strip())

print(data)

# input alphanumeric str1ing 
with open ("project.txt", "r") as myfile:
    str1=myfile.read().replace('\n', '')


# Regular Expression that removes (*) and Normalizes White Space - didn't work
#re.sub(r"$.*$|\s-\s.*", r"", str1)

# Regular Expression that removes (*) - didn't work
#re.sub(r"$.*$", r"", str1)

Gabriel Avenda&#241;o · Accepted Answer

You can try this. r"$(.*?)$"

The $ and $ say we want to target the actual parenthesis.

Then the parenthesis around the expression (.*?) say that we want to group what is inside.

Finally the .*? means that we want any character . and any repetition of that character *?.

s = "start (inside) this is in between (inside) end"
res = re.sub(r"$(.*?)$", "", s)
print(res)

prints

'start  this is in between  end'

Hope that helps.

How to remove parentheses and all data within using Python3

Answers (1)

Related Questions