alexyorke
alexyorke

Reputation: 4299

Regex to find unnecessary uppercase words

Here is my text:

TESTING TESTING test test test test test

I want the regex to return true (or a match) if more than 50% of the sentence is in capitals.

In this case, it would return false because only 14 letters of 20 are capitals.

In applescript, I'd do:

set a to characters of "abcdefghijklmnopqrstuvwxyz"
    set ac to characters of "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
    set this_message to characters of "TEST TEST TEST TEST test test test test test test"
    set x to 0 -- Counter
    set y to 1
    repeat with i from 1 to number of items in this_message
        set this_item to item i of this_message
        considering case
            if this_item is not " " then
                if this_item is in ac then
                    set x to x + 1
                end if
            end if
            if this_item is in {" ", ",", ".", "-"} then
                set y to y + 1
            end if
        end considering
    end repeat
    try
        if (round (x / ((count this_message) - y)) * 100) > 50 then
            return true
        else
            return false
        end if
    on error
        return false
    end try

Upvotes: 1

Views: 237

Answers (2)

ridgerunner
ridgerunner

Reputation: 34385

Here is a PHP function that returns TRUE if a string contains more than half CAPs:

// Test if more than half of string consists of CAPs.
function isMostlyCaps($text) {
    $len = strlen($text);
    if ($len) {  // Check if string has zero length.
        $capscnt = preg_match_all('/[A-Z]/', $text, $matches);
        if ($capscnt/$len > 0.5) return TRUE;
    }
    return FALSE;
}

The above function compares the count of caps to the total length of the string (including whitespace and non-letters). If you want to compare to the number of non-whitespace chars, then the function is easily modified:

// Test if more than half of non-whitespace chars in string are CAPs.
function isMostlyCaps($text) {
    $len = preg_match_all('/\S/', $text, $matches);
    if ($len) {  // Check if string has zero length.
        $capscnt = preg_match_all('/[A-Z]/', $text, $matches);
        if ($capscnt/$len > 0.5) return TRUE;
    }
    return FALSE;
}

Here is a version that considers counts of whole words:

// Test if more than half of "words" in string are all CAPs.
function isMostlyCapWords($text) {
    // For our purpose a "word" is a sequence of non-whitespace chars.
    $wordcnt = preg_match_all('/\S+/', $text, $matches);
    if ($wordcnt) {  // Check if string has no words.
        $capscnt = preg_match_all('/\b[A-Z]+\b/', $text, $matches);
        if ($capscnt/$wordcnt > 0.5) return TRUE;
    }
    return FALSE;
}

Upvotes: 2

Jonathan Hall
Jonathan Hall

Reputation: 79556

In perl:

sub mostly_caps {
    my $string = shift;
    my $upper = $string =~ tr/A-Z//;
    my $lower = $string =~ tr/a-z//;
    return $upper >= $lower;
}

And for bonus points, a version that takes an arbitrary percentage as an argument:

sub caps_pct {
    my ( $string, $pct ) = @_;
    my $upper = $string =~ tr/A-Z//;
    my $lower = $string =~ tr/a-z//;
    return ($upper/($upper+$lower) >= $pct/100;
}

It should be easy to adapt this to PHP or any other language.

Upvotes: 1

Related Questions