Edan Maor
Edan Maor

Reputation: 10052

Vim - use regex to lexicographically compare strings (to find earlier/later dates)

I want to write a simple regex, in vim, that will find all strings lexicographically smaller than another string.

Specifically, I want to use this to compare dates formatted as 2014-02-17. These dates are lexicographically sortable, which is why I use them.

My specific use case: I'm trying to run through a script and find all the dates that are earlier than today's today.

I'm also OK with comparing these as numbers, or any other solution.

Upvotes: 5

Views: 701

Answers (4)

Ruud Helderman
Ruud Helderman

Reputation: 11018

Use nested subpatterns. It starts simple, with the century:

[01]\d\d\d-\d\d-\d\d|20

As for each digit to follow, use one of the following patterns; you may want to replace .* by an appropriate sequence of \d and -.

for 0:   (0
for 1:   (0.*|1
for 2:   ([01].*|2
for 3:   ([0-2].*|3
for 4:   ([0-3].*|4
for 5:   ([0-4].*|5
for 6:   ([0-5].*|6
for 7:   ([0-6].*|7
for 8:   ([0-7].*|8
for 9:   ([0-8].*|9

For the last digit, you only need the digit range, e.g.:

[0-6]

Finally, all parentheses should be closed:

)))))

In the example of 2014-02-17, this becomes:

[01]\d\d\d-\d\d-\d\d|20
(0\d-\d\d-\d\d|1
([0-3]-\d\d-\d\d|4
-
(0
([01]-\d\d|2
-
(0\d|1
[0-6]
)))))

Now in one line:

[01]\d\d\d-\d\d-\d\d|20(0\d-\d\d-\d\d|1([0-3]-\d\d-\d\d|4-(0([01]-\d\d|2-(0\d|1[0-6])))))

For VIM, let's not forget to escape (, ) and |:

[01]\d\d\d-\d\d-\d\d\|20\(0\d-\d\d-\d\d\|1\([0-3]-\d\d-\d\d\|4-\(0\([01]-\d\d\|2-\(0\d\|1[0-6]\)\)\)\)\)

Would be best to try and generate this (much like in FDinoff's answer), rather than write it yourself...

Update: Here is a sample AWK script to generate the correct regex for any date yyyy-mm-dd.

#!/usr/bin/awk -f

BEGIN {                 # possible overrides for non-VIM users
    switch (digit) {
        case "ascii"     : digit = "[0-9]";     break;
        case "posix"     : digit = "[:digit:]"; break;
        default          : digit = "\\d";
    }
    switch (metachar) {
        case "unescaped" : escape = "";         break;
        default          : escape = "\\";
    }
}

/^[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]$/ {
    print BuildRegex($0);
}

function BuildRegex(s) {
    if (s ~ /^[1-9][^1-9]*$/) {
        regex = LessThanOnFirstDigit(s);
    }
    else {
        regex = substr(s, 1, 1) BuildRegex(substr(s, 2));    # recursive call
        if (s ~ /^[1-9]/) {
            regex = escape "(" LessThanOnFirstDigit(s) escape "|" regex escape ")";
        }
    }
    return regex;
}

function LessThanOnFirstDigit(s) {
    first = substr(s, 1, 1) - 1;
    rest = substr(s, 2);
    gsub(/[0-9]/, digit, rest);
    return (first ? "[0-" first "]" : "0") rest;
}

Call it like this:

echo 2014-02-17 | awk -f genregex.awk

Of course, you can write such a simple generator in any language you like. Would be nice to do it in Vimscript, but I have no experience with that, so I will leave that as a home assignment.

Upvotes: 1

benjifisher
benjifisher

Reputation: 5122

You do not say how you want to use this; are you sure that you really want a regular expression? Perhaps you could get away with

if DateCmp(date, '2014-02-24') < 0
  " ...
endif

In that case, try this function.

" Compare formatted date strings:
" @param String date1, date2
"   dates in YYYY-MM-DD format, e.g. '2014-02-24'
" @return Integer
"   negative, zero, or positive according to date1 < date2, date1 == date2, or
"   date1 > date2
function! DateCmp(date1, date2)
  let [year1, month1, day1] = split(a:date1, '-')
  let [year2, month2, day2] = split(a:date2, '-')
  if year1 != year2
    return year1 - year2
  elseif month1 != month2
    return month1 - month2
  else
    return day1 - day2
  endif
endfun

If you really want a regular expression, then try this:

" Construct a pattern that matches a formatted date string if and only if the
" date is less than the input date.  Usage:
" :echo '2014-02-24' =~ DateLessRE('2014-03-12')
function! DateLessRE(date)
  let init = ''
  let branches = []
  for c in split(a:date, '\zs')
    if c =~ '[1-9]'
      call add(branches, init . '[0-' . (c-1) . ']')
    endif
    let init .= c
  endfor
  return '\d\d\d\d-\d\d-\d\d\&\%(' . join(branches, '\|') . '\)'
endfun

Does that count as a "simple" regex? One way to use it would be to type :g/ and then CRTL-R and = and then DateLessRE('2014-02-24') and Enter, followed by the rest of your command. In other words,

:g/<C-R>=DateLessRE('2014-02-24')<CR>/s/foo/bar

EDIT: I added a concat (:help /\&) that matches a complete "formatted date string". Now, there is no need to anchor the pattern.

Upvotes: 3

FDinoff
FDinoff

Reputation: 31439

I don't think there is anyway to do this easily in regex. For matching any date earlier than the current date you can use run the function below (Some of the stuff was stolen from benjifisher)

function! Convert_to_char_class(cur) 
    if a:cur =~ '[2-9]'
        return '[0-' . (a:cur-1) . ']'
    endif
    return '0'
endfunction

function! Match_number_before(num)
    let branches = []
    let init = ''
    for i in range(len(a:num))
        if a:num[i] =~ '[1-9]'
            call add(branches, init . Convert_to_char_class(a:num[i]) . repeat('\d', len(a:num) - i - 1))
        endif 
        let init .= a:num[i]
    endfor
    return '\%(' . join(branches, '\|') .'\)'
endfunction

function! Match_date_before(date)
    if a:date !~ '\v\d{4}-\d{2}-\d{2}'
        echo "invalid date"
        return
    endif

    let branches =[]

    let parts = split(a:date, '-')
    call add(branches, Match_number_before(parts[0]) . '-\d\{2}-\d\{2}')
    call add(branches, parts[0] . '-' . Match_number_before(parts[1]) . '-\d\{2}')
    call add(branches, parts[0] . '-' . parts[1] . '-' .Match_number_before(parts[2]))

    return '\%(' . join(branches, '\|') .'\)'
endfunction

To use you the following to search for all matches before 2014-02-24.

/<C-r>=Match_date_before('2014-02-24')

You might be able to wrap it in a function to set the search register if you wanted to.

The generated regex for dates before 2014-02-24 is the following.

\%(\%([0-1]\d\d\d\|200\d\|201[0-3]\)-\d\{2}-\d\{2}\|2014-\%(0[0-1]\)-\d\{2}\|2014-02-\%([0-1]\d\|2[0-3]\)\)

It does not do any validation of dates. It assumes if you are in that format you are a date.


Equivalent set of functions for matching after the passed in date.

function! Convert_to_char_class_after(cur) 
    if a:cur =~ '[0-7]'
        return '[' . (a:cur+1) . '-9]'
    endif
    return '9'
endfunction

function! Match_number_after(num)
    let branches = []
    let init = ''
    for i in range(len(a:num))
        if a:num[i] =~ '[0-8]'
            call add(branches, init . Convert_to_char_class_after(a:num[i]) . repeat('\d', len(a:num) - i - 1))
        endif 
        let init .= a:num[i]
    endfor
    return '\%(' . join(branches, '\|') .'\)'
endfunction

function! Match_date_after(date)
    if a:date !~ '\v\d{4}-\d{2}-\d{2}'
        echo "invalid date"
        return
    endif

    let branches =[]

    let parts = split(a:date, '-')
    call add(branches, Match_number_after(parts[0]) . '-\d\{2}-\d\{2}')
    call add(branches, parts[0] . '-' . Match_number_after(parts[1]) . '-\d\{2}')
    call add(branches, parts[0] . '-' . parts[1] . '-' .Match_number_after(parts[2]))

    return '\%(' . join(branches, '\|') .'\)'
endfunction

The regex generated was

\%(\%([3-9]\d\d\d\|2[1-9]\d\d\|20[2-9]\d\|201[5-9]\)-\d\{2}-\d\{2}\|2014-\%([1-9]\d\|0[3-9]\)-\d\{2}\|2014-02-\%([3-9]\d\|2[5-9]\)\)

Upvotes: 3

Rick
Rick

Reputation: 561

If you wanted to search for all dates that were less than 2014-11-23, inclusive, you would use the following regex.

2014-(?:[1-9]|1[0-1])-(?:[1-9]|1[0-9]|2[0-3])

for a better explanation of the regex visit regex101.com and paste the regex in. You can also test it by using that site.

The basics of the regex are to search all dates that:

start with 2014-
either contain a single character from 1 - 9 
    or a 1 and a single character from 0 - 1, i.e. numbers from 1 - 11
finished by - and numbers from 1 - 23 done in the same style as the second term

Upvotes: 0

Related Questions