Reputation: 23
I'm trying to remove parenthesis and all data within using Python 3.
I've looked into several different threads, including here:
How to remove parentheses and all data within using Pandas/Python?
After finally getting:
re.sub(r"\(.*\)|\s-\s.*", r"", str1)
to run without errors, it didn't remove the content from the str1 string.
Then I tried this approach:
How to remove text within parentheses from Python string?
to remove the parenthesis and contents from the file before reading it in and storing to str1 - but I get this error:
Traceback (most recent call last):
File "sum_all.py", line 27, in <module>
data.append(line.replace(match.group(),'').strip())
AttributeError: 'NoneType' object has no attribute 'group'
Here is the code, I'm obviously new at this and appreciate any help!!
# Python3 program to calculate sum of
# all numbers present in a str1ing
# containing alphanumeric characters
# Function to calculate sum of all
# numbers present in a str1ing
# containing alphanumeric characters
import re
import math
import pyperclip
import pandas
def find_sum(str1):
# Regular Expression that matches digits in between a string
return sum(map(int,re.findall('\d+',str1)))
def find_sum2(str2):
# Regular Expression that matches digits where hr follows short for hours
return sum(map(int,re.findall('(\d+)hr',str1)))
str2=0
# Regular Expression
data=[]
pattern=r'\(.+\)|\s\-.+'
with open('project.txt','r') as f:
for line in f:
match=re.search(pattern,line)
data.append(line.replace(match.group(),'').strip())
print(data)
# input alphanumeric str1ing
with open ("project.txt", "r") as myfile:
str1=myfile.read().replace('\n', '')
# Regular Expression that removes (*) and Normalizes White Space - didn't work
#re.sub(r"\(.*\)|\s-\s.*", r"", str1)
# Regular Expression that removes (*) - didn't work
#re.sub(r"\(.*\)", r"", str1)
Upvotes: 2
Views: 2476
Reputation: 208
You can try this. r"\((.*?)\)"
The \(
and \)
say we want to target the actual parenthesis.
Then the parenthesis around the expression (.*?)
say that we want to group what is inside.
Finally the .*?
means that we want any character .
and any repetition of that character *?
.
s = "start (inside) this is in between (inside) end"
res = re.sub(r"\((.*?)\)", "", s)
print(res)
prints
'start this is in between end'
Hope that helps.
Upvotes: 2