Cícero Vargas
Cícero Vargas

Reputation: 13

How to delete multiple substrings?

I'm working on a script that get some information from a PGN file, a format used to describe chess games. I'm trying to copy the moves of each game separately in another file.

But sometimes, there are comments, marked by '{' and '}' characters, and I would like to strip them from the string (I'm copying each line of the file into a string to make some adjustments before writing on the output file).

An example of a string in this format would be:

'1.e4 {some comment} c5 2.Nf3 d6 3.d4 {another comment} Nxd4 {you got it}'

My first solution was simply:

my_string = my_string.replace(my_string[my_string.find('{'):my_string.find('}')], '')

Unfortunately, this stripped just the first set of comments, like this:

'1.e4 } c5 2.Nf3 d6 3.d4 {another comment} Nxd4 {you got it}'

(the '}' that remained is not a problem, it can be deleted with:

my_string = my_string.replace('}', '')

So I tried to loop over the string:

for char in my_string:
    if char == '{':
        my_string = my_string.replace(my_string[my_string.find('{'):my_string.find('}')], '')

The very same thing happened, only the first set of comments was deleted.

Then I tried a while loop:

while my_string.find('{') != -1:
    my_string = my_string.replace(my_string[my_string.find('{'):my_string.find('}')], '')

And now I am stuck in an infinite loop...

Anyone knows how to solve this? I would accept a solution with lists too, which I could embed inside:

temp_list = list(my_string)
#solution with list manupulation
my_string = ''.join(temp_list)

Upvotes: 1

Views: 138

Answers (3)

David C
David C

Reputation: 7484

Note that your attempts leave the the final } in place. This is because the my_string.find('}') returns the index of the }, but the replace function replaces everything up to but not including the index.

So, you need to increment the end index by 1:

my_string = my_string.replace(my_string[my_string.find('{'):my_string.find('}')+1], '')

As @Amadan's answer suggests, I'd probably just use regular expressions for this exercise.

Upvotes: 0

Ming
Ming

Reputation: 1693

As an additional remark to the other answer, if you are parsing a complex format (as PGN is, among many others), you should look into using a general-purpose parsing library, rather than writing your own ad-hoc parser. That will allow you to re-use shared logic that the library authors have written and debugged for you. Parsing is an extreme example of a use-case which has undergone a tremendous amount of research over the years, and by utilizing the proper library, you can benefit from this research in your own projects. This list on the official Python wiki suggests many possible options. This blog post offers a review of some popular options.

Upvotes: 0

Amadan
Amadan

Reputation: 198304

Regular expressions are perfect for this.

import re
re.sub(r'\s*{.*?}\s*', ' ', my_string)
# '1.e4 c5 2.Nf3 d6 3.d4 Nxd4 '

"replace any number of whitespace, an open curly, the least possible amount of anything at all (except newlines) followed by a closed curly and any amount of whitespace with a single space"

Upvotes: 3

Related Questions