user5380448
user5380448

Reputation:

How to fix "re.error: unterminated character set at position" in Python?

I am currently programming a script to get lyrics from the website "www.lyrics.com". I have this:

import os, string, re, requests

print("Enter lyrics.com site:")
url = input()

lyrics_raw_html = requests.get(url + '.html')
lyrics_raw = re.findall(r'<pre id=\"lyric-body-text\" class=\"lyric-body wselect-cnt\" dir=\"ltr\" data-lang=\"en\">([^]+)<\/pre>', lyrics_raw_html.text)
lyrics = re.sub(r'(<.+>)', '', lyrics_raw[0])

print(lyrics)

and when I input a page (this page for example) I get this error:

File "C:\Users\MYNAMEHERE\AppData\Local\Programs\Python\Python37-32\lib\sre_parse.py", line 532, in _parse
source.tell() - here)
re.error: unterminated character set at position 91

It seems to be from my regex, but after some tinkering, I have no idea what the problem is. Any help would be nice!

Thanks in advance.

Upvotes: 1

Views: 9231

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626758

In an ECMAScript compliant regex (the regex was probably written for that regex engine), [^] is used to match any char, it is a valid character class there that matches "anything but nothing", and thus matches everything.

You should use [\s\S]*? here instead of [^]+ to match any 0+ chars, as few as possible.

Upvotes: 1

Related Questions