Michel Rouzic
Michel Rouzic

Reputation: 1043

Adding N line breaks in a paragraph for the narrowest result

Let's say we have a paragraph such as this one:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

Assuming a fixed width font, we want to add exactly N line breaks (by replacing space characters only) to produce a N+1 line block of text.

Here's an example of output for N=8, we get a max line width of 51:

Lorem ipsum dolor sit amet, consectetur adipiscing 
elit, sed do eiusmod tempor incididunt ut labore 
et dolore magna aliqua. Ut enim ad minim veniam, 
quis nostrud exercitation ullamco laboris nisi ut 
aliquip ex ea commodo consequat. Duis aute irure 
dolor in reprehenderit in voluptate velit esse 
cillum dolore eu fugiat nulla pariatur. Excepteur 
sint occaecat cupidatat non proident, sunt in culpa 
qui officia deserunt mollit anim id est laborum.

How do we find which space characters to replace with line breaks to obtain the narrowest (least number of characters on the longest line) with the fewest attempts?

Upvotes: 6

Views: 149

Answers (3)

גלעד ברקן
גלעד ברקן

Reputation: 23955

(Adapted from here, How to partition an array of integers in a way that minimizes the maximum of the sum of each partition?)

If we consider the word lengths as a list of numbers, we can binary search the partition.

Our max length ranges from 0 to sum (word-length list) + (num words - 1), meaning the spaces. mid = (range / 2). We check if mid can be achieved by partitioning into N sets in O(m) time: traverse the list, adding (word_length + 1) to the current part while the current sum is less than or equal to mid. When the sum passes mid, start a new part. If the result includes N or less parts, mid is achievable.

If mid can be achieved, try a lower range; otherwise, a higher range. The time complexity is O(m log num_chars). (You'll also have to consider how deleting a space per part, meaning where the line break would go, features into the calculation.)

Upvotes: 5

RomCoo
RomCoo

Reputation: 1893

My try and error attempt. I am not quite sure if you always get the shortest line width, but the algorithm is fast and easy to understand/implement. And I think in most cases this should fit the needs

  • Let's assume you have M number of characters and you want to insert N linebreaks. Then you get the shortest possible linewidth: L_min = RoundToInfinity((M-N)/N+1) (N-M because we clear N spaces)
  • Fill each line with words, so that the linewidth is smaller or equal to L_min. This way your last line will contain a lot more characters then the others.
  • Now always search the largest line (at start this will be the last line), take the first word of it and put it at the and of the previous line. Repeat till first line is the longest.
  • At all time you should store your actual maximal linewidth L, so you can restore the situaltion when L was smallest.

Upvotes: 3

j_random_hacker
j_random_hacker

Reputation: 51226

Suppose the text consists of m words, which we will number from 1 to m. Define f(i, j) to be the maximum width (in characters) of any line in an optimal (minimal-width) solution to the subproblem consisting of just the first i words, under the restriction that exactly j line breaks are used. Then the width of the best possible sequence of breaks to the entire problem will be given by f(m, n). This can be solved fairly efficiently using dynamic programming.

Let the total length in characters of the fragment between word i and word j >= i, inclusive, be len(i, j). (This is easy to calculate: Just compute an array of m+1 values len0[j], for 0 <= j <= m, each giving the total length in characters of the fragment consisting of the first j words; then len(i, j) is just len0[j] - len0[i-1] - 1, with the convention that len0[0] = -1.)

The base case is easy:

f(i, 0) = len(0, i)   (i.e., if there are no line breaks)

The recursive case is:

f(i, j) = the minimum over all 0 <= k < i of max(f(k, j-1), len(k+1, i))

That is, to find the best way to break the first i words into j+1 lines (i.e. using j line breaks), we can try the following for every shorter k-word prefix: determine the best way to break that k-word prefix into j lines (i.e. using j-1 line breaks), and compare the maximum width we get from that with the width that results from putting the rest of the i-k words on a single line at the end. Each prefix gives us a different candidate solution, so we can pick the best of all of them.

Constructing a solution

Now that we can calculate the optimal width f(m, n), how can we use this to actually construct a solution? Fortunately there is a standard technique in dynamic programming for this. The fastest way is to record, during calculation of f(i, j), the (actually a, since there may in general be multiple optimal solutions) value of k that produced the minimum in a predecessor table pred[i][j]. Having calculated f(m, n) and filled in the predecessor table, we can then construct an optimal solution by walking backwards through it: pred[i][j] tells us a value k such that we can produce an optimal solution by adding a line break after word k, so add a line break there, and then look at pred[k][j-1] to find the position of the previous line break, continuing on until j reaches 0.

Time and space complexity

If the recursion is memoised using dynamic programming, then there are at most O(mn) different parameter combinations that f() could be called with (i ranges between 0 and m, and j ranges between 0 and n), and the time spent outside of recursive calls is O(m) (k can range between 0 and m, and the computation for each value of k is O(1)), so the time complexity of this solution is O(nm^2). The space complexity is O(mn), but by switching to a bottom-up DP this can easily be reduced to O(m), since while computing f(i, j) we only ever need access to values of f() for which the second parameter is j-1, meaning it suffices to actually store just the size-(m+1) array of computed values f(q, j-1) for 0 <= q <= m.

Upvotes: 6

Related Questions