BadGuyBen
BadGuyBen

Reputation: 21

How to split text depending on word count

I am trying to make a lyric project using discord.js, cheerio and the website called genius.com.
I have successfully found a way to scrape the lyrics from the website, I am onto the part where I need to split it because discord has a max word limit of 2000.
I can check how many characters/words are in the overall lyrics by doing lyrics.length, I just need to find a way to split the string and send both, in the future I might implement richEmbeds to make it more stylish but for now I'm focusing on the basics.

var request = require('request');
var cheerio = require('cheerio');

/*
This is a project for my discord bot, the reason for the 2000 word limit is because 
discords character limit is currently set to 2000, this means that i will have to add
a function to split the lyrics and send each part
*/

//Define the URL that we are going to be scraping the data from
var UR_L = "https://genius.com/Josh-a-and-jake-hill-not-afraid-of-dying-lyrics";

//send a request to the website and return the contents of the website
request(UR_L, function(err, resp, body) {
  //load the website using cheerio
  $ = cheerio.load(body);

  //define lyrics as the selector to text form
  var lyrics = $('p').text();

  if (lyrics.length > "2000" && lyrics.length < "4000") {

  } else if (lyrics.length > "4000" && lyrics.length < "6000") {

  } else {
    //send the lyrics as one message
  }
})

You can find a live version running here on repl.it.

Upvotes: 0

Views: 4068

Answers (3)

Federico Grandi
Federico Grandi

Reputation: 6806

You don't need to use any fancy function, that function is already built in discord.js: you can attach some options to a message, and MessageOptions.split is what you're searching for. When you want to send the text, do it like this:

channel.send(lyrics, { split: true });

If lyrics.length is greater that the limit, discord.js will cut your messages and send them one after the other, making it seem like it's only one.
channel is the TextChannel you want to send the messages to.

Upvotes: 2

Enslev
Enslev

Reputation: 622

Discord has a 2000 characters limit not a 2000 words limit.

One solution to your problem could be this:

// This will result in an array with strings of max 2000 length
const lyricsArr = lyrics.match(/.{1,2000}/g);  

lyricsArr.forEach(chunk => sendMessage(chunk))

Given the async nature of sending messages, you might want to look into modules like p-iteration to ensure the chunks arrive in the correct order.

That being said, there exists APIs for getting lyrics of songs, which I would recommend instead of scraping. See apiseeds lyrics API as an example.

UPDATE

    const lyrics = 'These are my lyrics';

    const lyricsArr = lyrics.match(/.{1,8}/g); 

    console.log(lyricsArr); // [ 'These ar', 'e my lyr', 'ics' ]

    lyricsArr.forEach((chunk, i) => {
      // Break if this is the last chunk.
      if (i == lyricsArr.length -1) {
        return;
      }
      // If last character is not a space, we split a word in two.
      // Add additional non-wordbreaking symbols between the slashes (in the regex) if needed.
      if (!chunk[chunk.length - 1].match(/[ ,.!]/)) {
        const lastWord = chunk.match(/\s([^ .]+)$/)
        lyricsArr[i + 1] = lastWord[1] + lyricsArr[i + 1];
        lyricsArr[i] = lyricsArr[i].split(/\s[^ .]*$/)[0];
      }
    })

    console.log(lyricsArr) // [ 'These', 'are my', 'lyrics' ]

Updated as per the comments. This is some crude code that i did not spend much time on, but it does the job.

Some info when using this approach:

  • You need to add any symbols that should not be considered wordbreaking to the regex in the second if
  • This has not been tested thoroughly, so use at your own risk.
  • It will definitely break if you have a word in the lyrics longer than the chunk size. Since this is around 2000, I imagine it will not be problem.
  • This will no longer ensure that the chunk length is below the limit, so change the limit to around 1900 to be safe

Upvotes: 1

F Blanchet
F Blanchet

Reputation: 1510

You can use .split( ) Javascript function.

word_list = lyrics.split(" ")

And word_list.length to access the number of words in your message and word_list[0] to select the first word for instance.

Upvotes: 0

Related Questions