Reputation: 593
I am looking for a script, method or tool to sort nested markdown lists. I use sublime text, which has a built sort lines function, but this function destroys the order of any nested list. For instance, if I want to sort:
* Zoo Animals
* Herbivores
* Zebra
* Gazelle
* Carnivores
* Tiger
* Lion
* Omnivores
* Gorilla
* Baboon
* Chimpanzee
* Domestic Animals
* Canines
* German Shepherd
* Cocker Spaniel
Using sublime sort lines function, I get:
* Baboon
* Chimpanzee
* Cocker Spaniel
* Gazelle
* German Shepherd
* Gorilla
* Lion
* Tiger
* Zebra
* Canines
* Carnivores
* Herbivores
* Omnivores
* Domestic Animals
* Zoo Animals
Clearly, this is not what I want. What I want is a "scoped sort" which sorts relative to each bullet level, without destroying the nested relationships, like so:
* Domestic Animals
* Canines
* Cocker Spaniel
* German Shepherd
* Zoo Animals
* Carnivores
* Lion
* Tiger
* Herbivores
* Gazelle
* Zebra
* Omnivores
* Baboon
* Chimpanzee
* Gorilla
Here's some things I've looked into and my thoughts on each:
How would you go about sorting a large nested markdown list?
UPDATE #1:
@J4G created a great Atom package that solved the original sort problem, see his answer for the link.
The previous list is a simple list without code blocks and numbered lists. However, when sorting a real life markdown list, we have code blocks and numbered lists and lines starting with special characters - nested within the list like so:
* Commands
* Migrations
* `rake db:migrate` - push all migrations to the database
* 'STEP=3' - revert the last 3 migrations
* `Rails`
* `c` - start rails console, run code from your app!
* `Rake`
* Rake Task
```ruby
desc 'process csv'
task process_csv: :environment do
Node.process_csv
end
```
* Package Upgrade Status:
1. Install Package
2. Attach Plugin
3. Review Installation
~~~
|Install|Status|
|Yes|Pending|
~~~
After sorting, I would think the above markdown list should return unchanged since tick marks and quotes have no sort significance and code blocks / numbered lists are already created in their proper order.
Upvotes: 3
Views: 1279
Reputation: 329
I created a vscode extension for this if anyone is still interested.
It does more than just a scoped sort, it can remove unique values, it can recursively sort the nested items, it can be case insensitive, etc.
It also satisfies OP's other request of having content under a list item.
The extension is part of a bigger project called scopedsort, which is implemented on the command line, npm, and a website. The source code can be found on github. Here's the file for the actual implementation.
Here it is in text form, very outdated, but does what the OP originally requested:
// @ts-check
const getValuesRegex = /^(?<indentation>\s*)(?<char>[-*+])/;
/**
* @typedef {object} Options
* @property {boolean} [recursive]
* @property {boolean} [reverse]
* @property {boolean} [unique]
* @property {boolean} [caseInsensitive]
*/
/**
* @param {string} a
* @param {string} b
*/
function stringSortCaseInsensitive(a, b) {
const lowerA = a.toLowerCase();
const lowerB = b.toLowerCase();
if (lowerA < lowerB) {
return -1;
} else if (lowerA > lowerB) {
return 1;
}
return 0;
}
/** @param {string} str **/
function calculateSpaceLength(str) {
return str.replace('\t', ' ').length;
}
/**
* @param {string[]} sections
* @param {Options} options
*/
function getModifiedSections(sections, options) {
if (options.caseInsensitive) {
sections.sort(stringSortCaseInsensitive);
} else {
sections.sort();
}
if (options.reverse) {
sections.reverse();
}
if (options.unique) {
/** @type {Set<string>} */
const haveSeen = new Set();
const unique = [];
for (const section of sections) {
const adjustedSection = options.caseInsensitive
? section.toLowerCase()
: section;
if (!haveSeen.has(adjustedSection)) {
unique.push(section);
haveSeen.add(adjustedSection);
}
}
return unique;
}
return sections;
}
/**
* @param {string[]} lines
* @param {number} index
* @param {Options} options
*/
function sortInnerSection(lines, index, options) {
/** @type {string[]} */
const sections = [];
let currentIndentation = '';
let amountAdded = 0;
for (let i = index; i < lines.length; i++) {
const line = lines[i];
const match = line.match(getValuesRegex);
const indentation = match?.groups?.indentation || '';
const listChar = match?.groups?.char;
if (!currentIndentation && indentation) {
currentIndentation = indentation;
}
const indentationLength = calculateSpaceLength(indentation);
const currentIndentationLength =
calculateSpaceLength(currentIndentation);
if (!listChar) {
amountAdded++;
sections[sections.length - 1] += '\n' + line;
} else if (indentationLength === currentIndentationLength) {
amountAdded++;
sections.push(line);
} else if (indentationLength > currentIndentationLength) {
const child = sortInnerSection(lines, i, options);
sections[sections.length - 1] += '\n' + child.content;
i += child.amountAdded - 1;
amountAdded += child.amountAdded;
} else {
break;
}
}
return {
content: getModifiedSections(sections, options).join('\n'),
amountAdded,
};
}
/**
* @param {string} text
* @param {Options} options
*/
function sort(text, options) {
const lines = text.trimEnd().split(/\r?\n/);
let sections = [];
let currentSection = [];
let currentIndentation = '';
for (let i = 0; i < lines.length; i++) {
const line = lines[i];
const match = line.match(getValuesRegex);
const indentation = match?.groups?.indentation || '';
const listChar = match?.groups?.char;
if (currentSection.length && listChar) {
if (indentation === currentIndentation) {
sections.push(currentSection.join('\n'));
currentSection = [line];
} else if (options.recursive) {
const child = sortInnerSection(lines, i, options);
currentSection.push(child.content);
i += child.amountAdded - 1;
} else {
currentSection.push(line);
}
} else {
currentSection.push(line);
}
}
if (currentSection) {
sections.push(currentSection.join('\n'));
}
return getModifiedSections(sections, options).join('\n');
}
module.exports = sort;
Upvotes: 3
Reputation: 110725
This is one way you could do it with Ruby. Suppose the string is held by the variable str
.
Code
def sort_indented(str)
arr = str.lines.map { |s| [indentation(s), s.chomp] }
indent_offset = arr.map(&:first).uniq.sort.each_with_index.
with_object({}) { |(indent, i),h| h[indent] = i }
dim_size = indent_offset.size
prev = []
arr.map do |indent, s|
a = ['']*dim_size
offset = indent_offset[indent]
a[offset] = s
a[0,offset] = prev.first(offset)
prev = a
a
end.sort.map { |a| a[a.rindex { |s| s != '' }] }.join("\n")
end
def indentation(s)
s[/^\s*/].size
end
Example
str =<<THE_END
* Zoo Animals
* Herbivores
* Zebra
* Gazelle
* Carnivores
* Tiger
* Lion
* Omnivores
* Gorilla
* Baboon
* Chimpanzee
* Domestic Animals
* Canines
* German Shepherd
* Cocker Spaniel
THE_END
In Ruby this construct for defining a string literal is called a "here document", or "here doc".
puts sort_indented(str)
* Domestic Animals
* Canines
* Cocker Spaniel
* German Shepherd
* Zoo Animals
* Carnivores
* Lion
* Tiger
* Herbivores
* Gazelle
* Zebra
* Omnivores
* Baboon
* Chimpanzee
* Gorilla
General Approach
When Ruby sorts an array of arrays, such as:
a = [1,2,4]
b = [4,5,6]
c = [1,2,3,5]]
[a, b, c]
it will first sort by the first element of each array. Since a
and c
both have the same element 1
at offset zero, and b
has a 4
at that offset, both a
and c
will come before b
in the sorted array. Ruby looks at the second elements of a
and c
to break the tie. As they both equal 2
, Ruby goes on to the third element, where the tie is broken: c
precedes a
since 3 < 4
.
I will convert arr
to the following array:
result =
[["* Zoo Animals" , "" , ""],
["* Zoo Animals" , " * Herbivores", ""],
["* Zoo Animals" , " * Herbivores", " * Zebra"],
["* Zoo Animals" , " * Herbivores", " * Gazelle"],
["* Zoo Animals" , " * Carnivores", ""],
["* Zoo Animals" , " * Carnivores", " * Tiger"],
["* Zoo Animals" , " * Carnivores", " * Lion"],
["* Zoo Animals" , " * Omnivores" , ""],
["* Zoo Animals" , " * Omnivores" , " * Gorilla"],
["* Zoo Animals" , " * Omnivores" , " * Baboon"],
["* Zoo Animals" , " * Omnivores" , " * Chimpanzee"],
["* Domestic Animals", "" , ""],
["* Domestic Animals", " * Canines" , ""],
["* Domestic Animals", " * Canines" , " * German Shepherd"],
["* Domestic Animals", " * Canines" , " * Cocker Spaniel"]]
Once in this form, we can sort:
result.sort
#=> [["* Domestic Animals", "", ""],
# ["* Domestic Animals", " * Canines", ""],
# ["* Domestic Animals", " * Canines", " * Cocker Spaniel"],
# ["* Domestic Animals", " * Canines", " * German Shepherd"],
# ["* Zoo Animals", "", ""], ["* Zoo Animals", " * Carnivores", ""],
# ["* Zoo Animals", " * Carnivores", " * Lion"],
# ["* Zoo Animals", " * Carnivores", " * Tiger"],
# ["* Zoo Animals", " * Herbivores", ""],
# ["* Zoo Animals", " * Herbivores", " * Gazelle"],
# ["* Zoo Animals", " * Herbivores", " * Zebra"],
# ["* Zoo Animals", " * Omnivores", ""],
# ["* Zoo Animals", " * Omnivores", " * Baboon"],
# ["* Zoo Animals", " * Omnivores", " * Chimpanzee"],
# ["* Zoo Animals", " * Omnivores", " * Gorilla"]]
The final step is to extract the last non-empty string from each element of the sorted array.
Detailed Explanation
First we define a helper method to compute the indentation of a string:
def indentation(s)
s[/^\s*/].size
end
For example,
#1234
indentation(" * Herbivores")
#=> 4
Now lets convert the string to an array of lines:
a = str.lines
#=> ["* Zoo Animals\n",
# " * Herbivores\n",
# " * Zebra\n",
# " * Gazelle\n",
# " * Carnivores\n",
# " * Tiger\n",
# " * Lion\n",
# " * Omnivores\n",
# " * Gorilla\n",
# " * Baboon\n",
# " * Chimpanzee\n",
# "* Domestic Animals\n",
# " * Canines\n",
# " * German Shepherd\n",
# " * Cocker Spaniel\n"]
Next, we convert a
to an array of pairs, the second element of the pair being the element of a
(a string), with the newline chopped off the end, the first being its indentation:
arr = a.map { |s| [indentation(s), s.chomp] }
# => [[0, "* Zoo Animals"], [4, " * Herbivores"],
# [8, " * Zebra"], [8, " * Gazelle"],
# [4, " * Carnivores"], [8, " * Tiger"],
# [8, " * Lion"], [4, " * Omnivores"],
# [8, " * Gorilla"], [8, " * Baboon"],
# [8, " * Chimpanzee"], [0, "* Domestic Animals"],
# [4, " * Canines"], [8, " * German Shepherd"],
# [8, " * Cocker Spaniel"]]
In fact, we would perform the first two operations in one step:
arr = str.lines.map { |s| [indentation(s), s.chomp] }
Next, we need to know the indents that are used:
indents = arr.map { |pair| pair.first }
#=> [0, 4, 8, 8, 4, 8, 8, 4, 8, 8, 8, 0, 4, 8, 8]
which we could write more economically like this:
indents = arr.map(&:first)
To find the unique indents we write:
unique = indents.uniq
#=> [0, 4, 8]
In case they are not in order, we should sort them:
sorted = unique.sort
#=> [0, 4, 8]
Each of the three indents will will correspond to offset in the array we will sort, so it is convenient to construct a hash:
indent_offset = sorted.each_with_index.with_object({}) do |(indent, i),h|
h[indent] = i
end
#=> {0=>0, 4=>1, 8=>2}
Again, we can perform this computation by combining several steps:
indent_offset = arr.map(&:first).uniq.sort.each_with_index.
with_object({}) { |(indent, i),h| h[indent] = i }
Next we replace each element of arr
with a three element array of strings:
dim_size = indent_offset.size
#=> 3
prev = []
result = arr.map do |indent, s|
a = ['']*dim_size
offset = indent_offset[indent]
a[offset] = s
a[0,offset] = prev.first(offset)
prev = a
a
end
The result of this calculation is the first array I gave under General Approach, above. We can now sort result
to obtain the second array I gave under General Approach:
sorted = result.sort
The last two steps are to replace each element of sorted
(a three-element array) with the last non-empty string:
sorted_strings = sorted.map { |a| a[a.rindex { |s| s != '' }] }
and then join those strings into a single string:
sorted_strings.join("\n")
Upvotes: 3
Reputation: 8523
If you're interested in using Atom (I highly recommend it as a free alternative to Sublime), I just made a package to do what you need.
https://atom.io/packages/markdown-sort-list
Upvotes: 4