Samrat Alam
Samrat Alam

Reputation: 567

Add punctuation at the end of sentence after regular expression processing: Python

I am trying to do some pattern match regular regression problems for my aspect-based sentiment analysis. I can not handle the punctuation in the right position after the pattern match.

def extra_expression1(txt):
  txt= str(txt)
  nlp=spacy.load("en_core_web_sm") 
  txt=nlp(txt)
  punc=''
  a=len(txt)
  for token in txt:
    if (token.is_punct==False):
      txt=str(txt)
      txt=re.sub('goo+d+[^a-z]','good',txt) #"goooodddd" to "good"
      a=a-1
    else:
      punc=token.text
      if (a!=0):
        txt=str(txt) + str(punc)
        punc=''
      else:
        txt=str(txt) + str(punc)
      a=a-1
  return txt

and

txt1=["hotel is goood! breakfast was bad."]
df_22=pd.DataFrame(
    {
        'clean_review' : txt1
    }
)
display(df_22)

for i,txt in enumerate(df_22['clean_review']):
  txt1= extra_expression1(txt)
  df_22['clean_review'].iloc[i]=txt1
df_22

the output is(last one after the process):

enter image description here

How can I solve this?

Upvotes: 2

Views: 113

Answers (1)

Tranbi
Tranbi

Reputation: 12711

Can't run your code but the character following good shouldn't be part of the match. Try using lookahead:

txt=re.sub('goo+d+(?=[^a-z])','good',txt) #"goooodddd" to "good"

Upvotes: 2

Related Questions