singularity
singularity

Reputation: 593

Sort Nested Markdown List?

I am looking for a script, method or tool to sort nested markdown lists. I use sublime text, which has a built sort lines function, but this function destroys the order of any nested list. For instance, if I want to sort:

* Zoo Animals
    * Herbivores
        * Zebra
        * Gazelle
    * Carnivores
        * Tiger
        * Lion
    * Omnivores
        * Gorilla
        * Baboon
        * Chimpanzee
* Domestic Animals
    * Canines
        * German Shepherd
        * Cocker Spaniel

Using sublime sort lines function, I get:

        * Baboon
        * Chimpanzee
        * Cocker Spaniel
        * Gazelle
        * German Shepherd
        * Gorilla
        * Lion
        * Tiger
        * Zebra
    * Canines
    * Carnivores
    * Herbivores
    * Omnivores
* Domestic Animals
* Zoo Animals

Clearly, this is not what I want. What I want is a "scoped sort" which sorts relative to each bullet level, without destroying the nested relationships, like so:

* Domestic Animals
    * Canines
        * Cocker Spaniel
        * German Shepherd
* Zoo Animals
    * Carnivores
        * Lion
        * Tiger
    * Herbivores
        * Gazelle
        * Zebra
    * Omnivores
        * Baboon
        * Chimpanzee
        * Gorilla

Here's some things I've looked into and my thoughts on each:

How would you go about sorting a large nested markdown list?

UPDATE #1:

@J4G created a great Atom package that solved the original sort problem, see his answer for the link.

The previous list is a simple list without code blocks and numbered lists. However, when sorting a real life markdown list, we have code blocks and numbered lists and lines starting with special characters - nested within the list like so:

* Commands
    * Migrations
        * `rake db:migrate` - push all migrations to the database
        * 'STEP=3' - revert the last 3 migrations
    * `Rails`
        * `c` - start rails console, run code from your app!
    * `Rake`
        * Rake Task
        ```ruby
        desc 'process csv'
        task process_csv: :environment do
            Node.process_csv
        end
        ```
* Package Upgrade Status:
    1. Install Package
    2. Attach Plugin
    3. Review Installation
    ~~~
    |Install|Status|
    |Yes|Pending|
    ~~~

After sorting, I would think the above markdown list should return unchanged since tick marks and quotes have no sort significance and code blocks / numbered lists are already created in their proper order.

Upvotes: 3

Views: 1279

Answers (3)

randomuser5767124
randomuser5767124

Reputation: 329

I created a vscode extension for this if anyone is still interested.

It does more than just a scoped sort, it can remove unique values, it can recursively sort the nested items, it can be case insensitive, etc.

It also satisfies OP's other request of having content under a list item.

The extension is part of a bigger project called scopedsort, which is implemented on the command line, npm, and a website. The source code can be found on github. Here's the file for the actual implementation.

Here it is in text form, very outdated, but does what the OP originally requested:

// @ts-check
const getValuesRegex = /^(?<indentation>\s*)(?<char>[-*+])/;

/**
 *  @typedef {object} Options
 *  @property {boolean} [recursive]
 *  @property {boolean} [reverse]
 *  @property {boolean} [unique]
 *  @property {boolean} [caseInsensitive]
 */

/**
 * @param {string} a
 * @param {string} b
 */
function stringSortCaseInsensitive(a, b) {
    const lowerA = a.toLowerCase();
    const lowerB = b.toLowerCase();

    if (lowerA < lowerB) {
        return -1;
    } else if (lowerA > lowerB) {
        return 1;
    }

    return 0;
}

/** @param {string} str **/
function calculateSpaceLength(str) {
    return str.replace('\t', '    ').length;
}

/**
 * @param {string[]} sections
 * @param {Options} options
 */
function getModifiedSections(sections, options) {
    if (options.caseInsensitive) {
        sections.sort(stringSortCaseInsensitive);
    } else {
        sections.sort();
    }

    if (options.reverse) {
        sections.reverse();
    }

    if (options.unique) {
        /** @type {Set<string>} */
        const haveSeen = new Set();
        const unique = [];

        for (const section of sections) {
            const adjustedSection = options.caseInsensitive
                ? section.toLowerCase()
                : section;

            if (!haveSeen.has(adjustedSection)) {
                unique.push(section);
                haveSeen.add(adjustedSection);
            }
        }

        return unique;
    }

    return sections;
}

/**
 * @param {string[]} lines
 * @param {number} index
 * @param {Options} options
 */
function sortInnerSection(lines, index, options) {
    /** @type {string[]} */
    const sections = [];
    let currentIndentation = '';
    let amountAdded = 0;

    for (let i = index; i < lines.length; i++) {
        const line = lines[i];
        const match = line.match(getValuesRegex);
        const indentation = match?.groups?.indentation || '';
        const listChar = match?.groups?.char;

        if (!currentIndentation && indentation) {
            currentIndentation = indentation;
        }

        const indentationLength = calculateSpaceLength(indentation);
        const currentIndentationLength =
            calculateSpaceLength(currentIndentation);

        if (!listChar) {
            amountAdded++;
            sections[sections.length - 1] += '\n' + line;
        } else if (indentationLength === currentIndentationLength) {
            amountAdded++;
            sections.push(line);
        } else if (indentationLength > currentIndentationLength) {
            const child = sortInnerSection(lines, i, options);
            sections[sections.length - 1] += '\n' + child.content;
            i += child.amountAdded - 1;
            amountAdded += child.amountAdded;
        } else {
            break;
        }
    }

    return {
        content: getModifiedSections(sections, options).join('\n'),
        amountAdded,
    };
}

/**
 *  @param {string} text
 *  @param {Options} options
 */
function sort(text, options) {
    const lines = text.trimEnd().split(/\r?\n/);
    let sections = [];
    let currentSection = [];
    let currentIndentation = '';

    for (let i = 0; i < lines.length; i++) {
        const line = lines[i];
        const match = line.match(getValuesRegex);
        const indentation = match?.groups?.indentation || '';
        const listChar = match?.groups?.char;

        if (currentSection.length && listChar) {
            if (indentation === currentIndentation) {
                sections.push(currentSection.join('\n'));
                currentSection = [line];
            } else if (options.recursive) {
                const child = sortInnerSection(lines, i, options);
                currentSection.push(child.content);
                i += child.amountAdded - 1;
            } else {
                currentSection.push(line);
            }
        } else {
            currentSection.push(line);
        }
    }

    if (currentSection) {
        sections.push(currentSection.join('\n'));
    }

    return getModifiedSections(sections, options).join('\n');
}

module.exports = sort;

Upvotes: 3

Cary Swoveland
Cary Swoveland

Reputation: 110725

This is one way you could do it with Ruby. Suppose the string is held by the variable str.

Code

def sort_indented(str)
  arr = str.lines.map { |s| [indentation(s), s.chomp] }
  indent_offset = arr.map(&:first).uniq.sort.each_with_index.
    with_object({}) { |(indent, i),h| h[indent] = i }
  dim_size = indent_offset.size 
  prev = []
  arr.map do |indent, s|
    a = ['']*dim_size
    offset = indent_offset[indent]
    a[offset] = s
    a[0,offset] = prev.first(offset)
    prev = a
    a
  end.sort.map { |a| a[a.rindex { |s| s != '' }] }.join("\n") 
end

def indentation(s)
  s[/^\s*/].size
end

Example

str =<<THE_END 
* Zoo Animals
    * Herbivores
        * Zebra
        * Gazelle
    * Carnivores
        * Tiger
        * Lion
    * Omnivores
        * Gorilla
        * Baboon
        * Chimpanzee
* Domestic Animals
    * Canines
        * German Shepherd
        * Cocker Spaniel
THE_END

In Ruby this construct for defining a string literal is called a "here document", or "here doc".

puts sort_indented(str)

* Domestic Animals
    * Canines
        * Cocker Spaniel
        * German Shepherd
* Zoo Animals
    * Carnivores
        * Lion
        * Tiger
    * Herbivores
        * Gazelle
        * Zebra
    * Omnivores
        * Baboon
        * Chimpanzee
        * Gorilla

General Approach

When Ruby sorts an array of arrays, such as:

a = [1,2,4]
b = [4,5,6]
c = [1,2,3,5]]
[a, b, c]

it will first sort by the first element of each array. Since a and c both have the same element 1 at offset zero, and b has a 4 at that offset, both a and c will come before b in the sorted array. Ruby looks at the second elements of a and c to break the tie. As they both equal 2, Ruby goes on to the third element, where the tie is broken: c precedes a since 3 < 4.

I will convert arr to the following array:

result =     
[["* Zoo Animals"     , ""                , ""],
 ["* Zoo Animals"     , "    * Herbivores", ""],
 ["* Zoo Animals"     , "    * Herbivores", "        * Zebra"],
 ["* Zoo Animals"     , "    * Herbivores", "        * Gazelle"],
 ["* Zoo Animals"     , "    * Carnivores", ""],
 ["* Zoo Animals"     , "    * Carnivores", "        * Tiger"],
 ["* Zoo Animals"     , "    * Carnivores", "        * Lion"], 
 ["* Zoo Animals"     , "    * Omnivores" , ""],
 ["* Zoo Animals"     , "    * Omnivores" , "        * Gorilla"],
 ["* Zoo Animals"     , "    * Omnivores" , "        * Baboon"],
 ["* Zoo Animals"     , "    * Omnivores" , "        * Chimpanzee"],
 ["* Domestic Animals", ""                , ""],
 ["* Domestic Animals", "    * Canines"   , ""],
 ["* Domestic Animals", "    * Canines"   , "        * German Shepherd"],
 ["* Domestic Animals", "    * Canines"   , "        * Cocker Spaniel"]]

Once in this form, we can sort:

result.sort
  #=> [["* Domestic Animals", "", ""],
  #    ["* Domestic Animals", "    * Canines", ""],
  #    ["* Domestic Animals", "    * Canines", "        * Cocker Spaniel"],
  #    ["* Domestic Animals", "    * Canines", "        * German Shepherd"],
  #    ["* Zoo Animals", "", ""], ["* Zoo Animals", "    * Carnivores", ""],
  #    ["* Zoo Animals", "    * Carnivores", "        * Lion"],
  #    ["* Zoo Animals", "    * Carnivores", "        * Tiger"],
  #    ["* Zoo Animals", "    * Herbivores", ""],
  #    ["* Zoo Animals", "    * Herbivores", "        * Gazelle"],
  #    ["* Zoo Animals", "    * Herbivores", "        * Zebra"],
  #    ["* Zoo Animals", "    * Omnivores", ""],
  #    ["* Zoo Animals", "    * Omnivores", "        * Baboon"],
  #    ["* Zoo Animals", "    * Omnivores", "        * Chimpanzee"],
  #    ["* Zoo Animals", "    * Omnivores", "        * Gorilla"]] 

The final step is to extract the last non-empty string from each element of the sorted array.

Detailed Explanation

First we define a helper method to compute the indentation of a string:

def indentation(s)
  s[/^\s*/].size
end

For example,

            #1234
indentation("    * Herbivores")
  #=> 4

Now lets convert the string to an array of lines:

a = str.lines
  #=> ["* Zoo Animals\n",
  #    "    * Herbivores\n",
  #    "        * Zebra\n",
  #    "        * Gazelle\n",
  #    "    * Carnivores\n",
  #    "        * Tiger\n",
  #    "        * Lion\n",
  #    "    * Omnivores\n",
  #    "        * Gorilla\n",
  #    "        * Baboon\n",
  #    "        * Chimpanzee\n",
  #    "* Domestic Animals\n",
  #    "    * Canines\n",
  #    "        * German Shepherd\n",
  #    "        * Cocker Spaniel\n"]

Next, we convert a to an array of pairs, the second element of the pair being the element of a (a string), with the newline chopped off the end, the first being its indentation:

arr = a.map { |s| [indentation(s), s.chomp] }
  # => [[0, "* Zoo Animals"],        [4, "    * Herbivores"],
  #     [8, "        * Zebra"],      [8, "        * Gazelle"],
  #     [4, "    * Carnivores"],     [8, "        * Tiger"],
  #     [8, "        * Lion"],       [4, "    * Omnivores"],
  #     [8, "        * Gorilla"],    [8, "        * Baboon"],
  #     [8, "        * Chimpanzee"], [0, "* Domestic Animals"],
  #     [4, "    * Canines"],        [8, "        * German Shepherd"],
  #     [8, "        * Cocker Spaniel"]] 

In fact, we would perform the first two operations in one step:

arr = str.lines.map { |s| [indentation(s), s.chomp] }

Next, we need to know the indents that are used:

indents = arr.map { |pair| pair.first }
  #=> [0, 4, 8, 8, 4, 8, 8, 4, 8, 8, 8, 0, 4, 8, 8] 

which we could write more economically like this:

indents = arr.map(&:first)

To find the unique indents we write:

unique = indents.uniq
  #=> [0, 4, 8] 

In case they are not in order, we should sort them:

sorted = unique.sort
  #=> [0, 4, 8] 

Each of the three indents will will correspond to offset in the array we will sort, so it is convenient to construct a hash:

indent_offset = sorted.each_with_index.with_object({}) do |(indent, i),h|
  h[indent] = i
end
  #=> {0=>0, 4=>1, 8=>2}

Again, we can perform this computation by combining several steps:

indent_offset = arr.map(&:first).uniq.sort.each_with_index.
  with_object({}) { |(indent, i),h| h[indent] = i }

Next we replace each element of arr with a three element array of strings:

dim_size = indent_offset.size 
  #=> 3
prev = []
result = arr.map do |indent, s|
  a = ['']*dim_size
  offset = indent_offset[indent]
  a[offset] = s
  a[0,offset] = prev.first(offset)
  prev = a
  a
end

The result of this calculation is the first array I gave under General Approach, above. We can now sort result to obtain the second array I gave under General Approach:

sorted = result.sort

The last two steps are to replace each element of sorted (a three-element array) with the last non-empty string:

sorted_strings = sorted.map { |a| a[a.rindex { |s| s != '' }] }

and then join those strings into a single string:

sorted_strings.join("\n")

Upvotes: 3

Jack Guy
Jack Guy

Reputation: 8523

If you're interested in using Atom (I highly recommend it as a free alternative to Sublime), I just made a package to do what you need.

https://atom.io/packages/markdown-sort-list

Upvotes: 4

Related Questions