Katerina
Katerina

Reputation: 53

A way to remove all occurrences of words within brackets in a string?

I'm trying to find a way to delete all mentions of references in a text file.

I haven't tried much, as I am new to Python but thought that this is something that Python could do.

def remove_bracketed_words(text_from_file: string) -> string:
    """Remove all occurrences of words with brackets surrounding them, 
    including the brackets.

    >>> remove_bracketed_words("nonsense (nonsense, 2015)")
    "nonsense "
    >>> remove_bracketed_words("qwerty (qwerty) dkjah (Smith, 2018)")
    "qwerty  dkjah "
    """
    with open('random_text.txt') as file:
        wholefile = f.read()
        for '(' in 

I have no idea where to go from here or if what I've done is right. Any suggestions would be helpful!

Upvotes: 2

Views: 403

Answers (2)

Cyrbuzz
Cyrbuzz

Reputation: 119

Try re

>>> import re
>>> re.sub(r'\(.*?\)', '', 'nonsense (nonsense, 2015)')
'nonsense '
>>> re.sub(r'\(.*?\)', '', 'qwerty (qwerty) dkjah (Smith, 2018)')
'qwerty  dkjah '

import re
def remove_bracketed_words(text_from_file: string) -> string:
    """Remove all occurrences of words with brackets surrounding them, 
    including the brackets.

    >>> remove_bracketed_words("nonsense (nonsense, 2015)")
    "nonsense "
    >>> remove_bracketed_words("qwerty (qwerty) dkjah (Smith, 2018)")
    "qwerty  dkjah "
    """
    with open('random_text.txt', 'r') as file:
       wholefile = file.read()
    # Be care for use 'w', it will delete raw data.
    whth open('random_text.txt', 'w') as file:
        file.write(re.sub(r'\(.*?\)', '', wholefile))

Upvotes: 1

TigerhawkT3
TigerhawkT3

Reputation: 49318

You'll have an easier time with a text editing program that handles regular expressions, like Notepad++, than learning Python for this one task (reading in a file, correcting fundamental errors like for '(' in..., etc.). You can even use tools available online for this, such as RegExr (a regular expression tester). In RegExr, write an appropriate expression into the "expression" field and paste your text into the "text" field. Then, in the "tools" area below the text, choose the "replace" option and remove the placeholder expression. Your cleaned-up text will appear there.

You're looking for a space, then a literal opening parenthesis, then some characters, then a comma, then a year (let's just call that 3 or 4 digits), then a literal closing parenthesis, so I'd suggest the following expression:

 \(.*?, \d{3,4}\)

This will preserve non-citation parenthesized text and remove the leading space before a citation.

Upvotes: 1

Related Questions