Fatbob
Fatbob

Reputation: 28

How to add a new line before a capital letter?

I am writing a piece of code to get lyrics from genius.com.

I have managed to extract the code from the website but it comes out in a format where all the text is on one line.

I have used regex to add a space but cannot figure out how to add a new line. Here is my code so far:

text_container = re.sub(r"(\w)([A-Z])", r"\1 \2", text_container.text)

This adds a space before the capital letter, but I cannot figure out how to add a new line.

It is returning [Verse 1]Leaves are fallin' down on the beautiful ground I heard a story from the man in red He said, "The leaves are fallin' down

I would like to add a new line before "He" in the command line.

Any help would be greatly appreciated. Thanks :)

Upvotes: 0

Views: 474

Answers (4)

TheAnister
TheAnister

Reputation: 1

I'm not sure about using regex. Try this method:

text = lyrics
new_text = ''

for i, letter in enumerate(text):
    if i and letter.isupper():
        new_text += '\n'

    new_text += letter
    
print(new_text)

However, as oscillate123 has explained, it will create a new line for every capital letter regardless of the context.

Upvotes: 0

Trevor Hurst
Trevor Hurst

Reputation: 74

Looking around I found an API that python has to pull lyrics from Genius.com, here's the link to the PyPI:

https://lyricsgenius.readthedocs.io/en/master/

Just follow the instructions and it should have what you need, with more info on the problem I could provide a more detailed response

Upvotes: 0

Shabble
Shabble

Reputation: 592

A quick skim of the view-source for a genius lyrics page suggests that you're stripping all the HTML markup which would otherwise contain the info about linebreaks etc.

You're probably better off posting that code (likely as a separate question) and asking how to correctly extract not just the text nodes, but also enough of the <span> structure to format it as necessary.

Upvotes: 1

oscillate123
oscillate123

Reputation: 36

If genius.com doesn't somehow provide a separator, it will be very hard to find a way to know what to look for.

In your example, I made a regex searching for " [A-Z]", which will find " He...". But it will also find all places where a sentence starts with " I...". Sometimes new sentences will start with "I...", but it might make new lines where there actually shouldn't be one.

TL;DR - genius.com needs to provide some sort of separator so we know when there should be a new line.

Disclaimer: Unless I missed something in your description/example

Upvotes: 2

Related Questions