jt196
jt196

Reputation: 17

Regex to replace whitespace in Markdown URLs

I've a bunch of Markdown links with whitespace, and I need to replace the whitespace with %20. So far I've hacked a few solutions, but none that work in VSCode, or do exactly what I'm looking for.

This is the URL format conversion I need:

[My link](../../_resources/my resource.jpg)
[My link](../../_resources/my%20resource.jpg)

\s+(?=[^(\)]*\)) will work on any whitespace inside brackets - but gives false positives as it works on anything with brackets.

(?:\]\(|(?!^)\G)[^]\s]*\K\h+ does the job, but I'm getting some "Invalid Escape Character" messages in VSCode, so I assume the language isn't compatible.

I've been trying to identify the link on the characters ]( but as I'm relatively new to regex, struggling a bit.

I tried with this regex: (?<=\]\()s\+ as this (?<=\]\().+ correctly identifies the url, but it doesn't work.

Where am I going wrong here? Thanks in advance!

EDIT: VSCode find in files doesn't support variable length lookbehind, even though find/replace in the open file does support this. Open to any other solutions before I dive into writing a script!

Upvotes: 0

Views: 731

Answers (3)

jt196
jt196

Reputation: 17

In the end, as I'm on a Mac and didn't want to fire up a virtual PC to run Notepad++ (Sublime uses the same engine and Atom doesn't allow you exclude files), I used a combination of a Python script with @Wiktor Stribizew's answer for individual files that weren't picked up by the pattern for whatever reason.

md_url_pattern = r'(\[(.+)\])\(([^\)]+)\)'

def remove_spacing(match_obj):
    if match_obj.group(3) is not None:
        print("Match Object: " + match_obj.group(1) + "(" + re.sub(r"\s+", "%20", match_obj.group(3)) + ")")
        return match_obj.group(1) + "(" + re.sub(r"\s+", "%20", match_obj.group(3)) + ")"

# THIS_FOLDER = os.path.dirname(os.path.abspath(__file__))
this_folder = '<my_document_folder>' # fixed folder path
note_path = '<note_folder>' # change this 
full_path = os.path.join(this_folder, note_path)
directory = os.listdir(full_path)
os.chdir(full_path)

for file in directory:
    open_file = open(file, 'r')
    read_file = open_file.read()
    read_file = re.sub(md_url_pattern, remove_spacing, read_file)
    if not read_file:
        print("Empty file!")
    else:
        write_file = open(file,'w')
        write_file.write(read_file)

This script could do with a bit of tidying up and debugging (the odd weird empty file and no subfolder compatibility) but it was the best I could do.

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627082

You can use

(?<=\]\([^\]]*)\s+(?=[^()]*\))

Replace with %20. See the demo screenshot:

enter image description here

Details:

  • (?<=\]\([^\]]*) - a positive lookbehind that matches a location that is immediately preceded with ]( and then any zero or more chars other than ]
  • \s+ - any one or more whitespace chars (other than line break chars in Visual Studio Code, if there is no \n or \r in the regex, \sdoes not match line break chars)
  • (?=[^()]*\)) - a positive lookahead that matches a location that is immediately followed with zero or more chars other than ( and ) and then a ) char.

Since you are using it in Find/Replace in Files, this lookbehind solution won't work.

You can use Notepad++ with

(\G(?!\A)|\[[^][]*]\()([^()\s]*)\s+(?=[^()]*\))

and $1$2%20 replacement pattern. In Notepad++, press CTRL+SHIFT+F and after filling out the necessary fields, hit Replace in Files.

See the sample settings:

enter image description here

Upvotes: 1

41686d6564
41686d6564

Reputation: 19651

VSCode regex does not support \K, \G, or \h, but it does support Lookbehinds with non-fixed width. So, you may use something like the following:

(?<=\]\([^\]\r\n]*)[^\S\r\n]+

Online demo.

Upvotes: 2

Related Questions