Reputation: 158
This is a recreational pursuit, and is not homework. If you value academic challenges, please read on.
A radio quiz show had a segment requesting listeners to call in with words that have their characters in alphabetical order, e.g. "aim", "abbot", "celt", "deft", etc. I got these few examples by a quick Notepad++ (NPP) inspection of a Scrabble dictionary word list.
I'm looking for an elegant way in T-SQL to determine if a word qulifies for the list, i.e. all its letters are in alpha order, case insensitive.
It seemed to me that there should be some kind of T-SQL algorithm possible that will do a SELECT on a table of English words and return the complete list of all words in the Srcabble dictionary that meets the spec. I've spent considerable time looking at regex strings, but haven't hit on anything that comes even remotely close. I've thought about the obvious looping scenario, but abandoned it for now as "inelegant". I'm looking for your ideas that will obtain the qualifying word list,
preferably using
- a REGEX expression
- a tally-table-based approach
- a scalar UDF that returns 1 if the input word meets the requirement, else 0.
- Other, only limited by your creativity.
But preferably NOT using
- a looping structure
- a recursive solution
- a CLR solution
Assumptions/observations:
1. A "word" is defined here as two or more characters. My dictionary shows 55 2-character words, of which only 28 qualify.
2. No word will have more than two concecutive characters that are identical. (If you find one, please point it out.)
3. At 21 characters, "electroencephalograms" is the longest word in my Scrabble dictionary
(though why that word is in the Scrabble dictionary escapes me--the board is only a 15-by-15 grid.)
Consider 21 as the upper limit on word length.
4. All words LIKE 'Z%' can be dismissed because all you can create is {'Z','ZZ', ... , 'ZZZ...Z'}.
5. As the dictionary's words' initial character proceedes through the alphabet, fewer words will qualify.
6. As the word lengths get longer, fewer words will qualify.
7. I suspect that there will be less than 0.2% of my dictionary's 60,387 words that will qualify.
For example, I've tried NPP regex searches like "^a[a-z][b-z][b-z][c-z][c-z][d-z][d-z][e-z]" for 9-letter words starting with "a", but the character-by-character alphabetic enforcement is not handled properly. This search will return "abilities" which fails the test with the "i" that follows the "l".
There's several free Scrabble word lists available on the web, but Phil Factor gives a really interesting treatment of T-SQL/Scrabble considerations at https://www.simple-talk.com/sql/t-sql-programming/the-sql-of-scrabble-and-rapping/ which is where I got my word list.
Care to give it a shot?
Upvotes: 0
Views: 242
Reputation: 7392
Interesting idea...
Here's my take on it. This returns a list of words that are in order, but you could easily return 1 instead.
DECLARE @WORDS TABLE (VAL VARCHAR(MAX))
INSERT INTO @WORDS (VAL)
VALUES ('AIM'), ('ABBOT'), ('CELT'), ('DAVID')
;WITH CHARS
AS
(
SELECT VAL AS SOURCEWORD, UPPER(VAL) AS EVALWORD, ASCII(LEFT(UPPER(VAL),1)) AS ASCIICODE, RIGHT(VAL,LEN(UPPER(VAL))-1) AS REMAINS, 1 AS ROWID, 1 AS INORDER, LEN(VAL) AS WORDLENGTH
FROM @WORDS
UNION ALL
SELECT SOURCEWORD, REMAINS, ASCII(LEFT(REMAINS,1)), RIGHT(REMAINS,LEN(REMAINS)-1), ROWID+1, INORDER+CASE WHEN ASCII(LEFT(REMAINS,1)) >= ASCIICODE THEN 1 ELSE 0 END AS INORDER, WORDLENGTH
FROM CHARS
WHERE LEN(REMAINS)>=1
),
ONLYINORDER
AS
(
SELECT *
FROM CHARS
WHERE ROWID=WORDLENGTH AND INORDER=WORDLENGTH
)
SELECT SOURCEWORD
FROM ONLYINORDER
Here it is as a UDF:
CREATE FUNCTION dbo.AlphabetSoup (@Word VARCHAR(MAX))
RETURNS BIT
AS
BEGIN
SET @WORD = UPPER(@WORD)
DECLARE @RESULT INT
;WITH CHARS
AS
(
SELECT @WORD AS SOURCEWORD,
@WORD AS EVALWORD,
ASCII(LEFT(@WORD,1)) AS ASCIICODE,
RIGHT(@WORD,LEN(@WORD)-1) AS REMAINS,
1 AS ROWID,
1 AS INORDER,
LEN(@WORD) AS WORDLENGTH
UNION ALL
SELECT SOURCEWORD,
REMAINS,
ASCII(LEFT(REMAINS,1)),
RIGHT(REMAINS,LEN(REMAINS)-1),
ROWID+1,
INORDER+CASE WHEN ASCII(LEFT(REMAINS,1)) >= ASCIICODE THEN 1 ELSE 0 END AS INORDER,
WORDLENGTH
FROM CHARS
WHERE LEN(REMAINS)>=1
),
ONLYINORDER
AS
(
SELECT 1 AS RESULT
FROM CHARS
WHERE ROWID=WORDLENGTH AND INORDER=WORDLENGTH
UNION
SELECT 0
FROM CHARS
WHERE NOT (ROWID=WORDLENGTH AND INORDER=WORDLENGTH)
)
SELECT @RESULT = RESULT FROM ONLYINORDER
RETURN @RESULT
END
Upvotes: 0
Reputation: 7847
I did something similar to Andriy. I created a numbers table with value 1-21. I use it to create one set of data with the individual letters order by the index and the a second set ordered alphabetically. Joined the sets to each other on the letter and numbers. I then count nulls. Anything over 0 means it is not in order.
DECLARE @word VARCHAR(21)
SET @word = 'abbot'
SELECT Count(1)
FROM (SELECT Substring(@word, number, 1) AS Letter,
Row_number() OVER ( ORDER BY number) AS letterNum
FROM numbers
WHERE number <= CONVERT(INT, Len(@word))) a
LEFT OUTER JOIN (SELECT Substring(@word, number, 1) AS letter,
Row_number() OVER ( ORDER BY Substring(@word, number, 1)) AS letterNum
FROM numbers
WHERE number <= CONVERT(INT, Len(@word))) b
ON a.letternum = b.letternum
AND a.letter = b.letter
WHERE b.letter IS NULL
Upvotes: 0
Reputation: 77667
Split the word into individual characters using a numbers table. Use the numbers as one set of indices. Use ROW_NUMBER to create another set. Compare the two sets of indices to see if they match for every character to see if they match. If they do, the letters in the word are in the alphabetical order.
DECLARE @Word varchar(100) = 'abbot';
WITH indexed AS (
SELECT
Index1 = n.Number,
Index2 = ROW_NUMBER() OVER (ORDER BY x.Letter, n.Number),
x.Letter
FROM
dbo.Numbers AS n
CROSS APPLY
(SELECT SUBSTRING(@Word, n.Number, 1)) AS x (Letter)
WHERE
n.Number BETWEEN 1 AND LEN(@Word)
)
SELECT
Conclusion = CASE COUNT(NULLIF(Index1, Index2))
WHEN 0 THEN 'Alphabetical'
ELSE 'Not alphabetical'
END
FROM
indexed
;
The NULLIF(Index, Index2)
expression does the comparison: it returns a NULL if the the arguments are equal, otherwise it returns the value of Index1
. If all indices match, all the results will be NULL and COUNT will return 0, which means the order of letters in the word was alphabetical.
Upvotes: 1