Dirty SR
Dirty SR

Reputation: 209

Issue converting text from first person to second person, while ignoring text within quotations " "

I am trying to convert stories/sentences/words/etc from first person to second person grammer, but trying to not convert text within quotes " " or ' '.

This is being run in google colab, python 3 notebook. Code reads a file in my googledrive, reads the .txt file, converts the words from file through 'forms =' from first person to second person. There is also an issue where spaces are inserted before and after quotes (" and ' are affected) after conversion takes place.


import nltk
from google.colab import drive

drive.mount('/content/drive')

sent = open('/content/drive/My Drive/storyuno.txt', 'r') 


forms = {"am" : "are", "are" : "am", 'i' : 'you', 'my' : 'yours', 'me' : 'you', 'mine' : 'yours', 'you' : 'I', 'your' : 'my', 'yours' : 'mine'} # More?
def translate(word):
  if word.lower() in forms: return forms[word.lower()]
  return word

translated = []
quote_mode = False
for word in nltk.wordpunct_tokenize(sent.read()):
   if quote_mode:
       translated.append(word)
       if word == '"': quote_mode = False;

   if not quote_mode:
       translated.append(translate(word))
   if word == '"': quote_mode = True;

result = ' '.join(translated)

print(result) 
sent.close()

The story I input:

The bottom line is that if I was going to tell anyone about the frog, it would be Soy. I decided that our walk home would be the most opportune time. “Did you see anything outside today during math?” I asked Soy as we started walking. “What do you mean? Like in the sky?” he asked, jumping over cracks in the sidewalk. “I mean right outside the window. Like right up against it,” I answered. “Like a person?” he asked, still hopping. Soy sat in the row farthest from the window, so it was possible, but unlikely, for someone to walk by without him noticing.

It converts to:

The bottom line is that if you was going to tell anyone about the frog , it would be Soy . you decided that our walk home would be the most opportune time . “ Did I see anything outside today during math ?” you asked Soy as we started walking . “ What do I mean ? Like in the sky ?” he asked , jumping over cracks in the sidewalk . “ you mean right outside the window . Like right up against it ,” you answered . “ Like a person ?” he asked , still hopping . Soy sat in the row farthest from the window , so it was possible , but unlikely , for someone to walk by without him noticing .

The issue is that the text within quotes should NOT be converted. Ex: I tell her, "You are boring". ---> You tell her, "You are boring".

Ignore any grammer mistakes besides the quote issue, I will fix it later.

Upvotes: 1

Views: 274

Answers (1)

Amadan
Amadan

Reputation: 198436

You have two problems with the quote. The first one is that is not equal to ". The second is that quotes can be bundled up with neighbouring punctuation, so you get tokens like ?”. The solution is to use a regular expression to check for the presence of any quotation in the token:

import re
quote_re = re.compile(r'["“”]')

then change

if word == '"':

into

if quote_re.search(word):

The issue with spaces can be fixed by detokenisation:

from nltk.tokenize.treebank import TreebankWordDetokenizer
detokenizer = TreebankWordDetokenizer()
result = detokenizer.detokenize(translated)

Upvotes: 1

Related Questions