Reputation: 414655
What it means to be proper Roman numerals may vary. For simplicity (no Unicode, no multiplicative principle, no double subtractives, no overbars, no large numbers, etc) for the sake of this question, valid Roman numerals are defined by the regex:
^(M{0,3})(D?C{0,3}|CM|CD)(L?X{0,3}|XC|XL)(V?I{0,3}|IX|IV)$
Code example with POSIX regexec()
. The regex matches Roman numerals in 1..3999 range represented using "strict" rules.
There are many solutions that can convert Roman numerals if we don't need to reject invalid numerals, for example:
int roman_numeral_value(unsigned char c)
{
switch(toupper(c)) {
case 'I': return 1;
case 'V': return 5;
case 'X': return 10;
case 'L': return 50;
case 'C': return 100;
case 'D': return 500;
case 'M': return 1000;
default: return 0; // error
}
}
int roman_numeral_to_int(const char *s, int size)
{
int total = 0, prev = 0;
for (int i = size-1; i >= 0; --i) { // in reverse order
int value = roman_numeral_value(s[i]);
total += value < prev ? -value : value; // subtract if necessary
prev = value;
}
return total;
}
It works for valid Roman numerals. But roman_numeral_to_int()
accepts numerals such as IIIII
that are rejected by the regex. Is there a similarly simple cross-platform solution that doesn't require pcre_exec()
or other external dependencies that works for valid Roman numerals and only for them?
Upvotes: 3
Views: 2373
Reputation: 154075
To create some level of rule flexibility, the following Roman_string_to_unsigned0()
employs a table.
It follows the strtol()
style of functionality in that an end-pointer is returned indicating where parsing stopped. De-ref and test against '\0'
for success.
The function has a bool subtractive
parameter to steer the two major types of Roman Numeral parsing: basic, subtractive.
static const struct Roman_digit {
char ch[3];
bool subtractive;
unsigned char limit;
unsigned char nextdown; // with parse success, offset to next element to try
unsigned value;
} Roman_table[] = {
{ "I", false, 4, 1, 1 }, //
{ "IV", true, 1, 2, 4 }, //
{ "V", false, 1, 2, 5 }, //
{ "IX", true, 1, 4, 9 }, //
{ "X", false, 4, 1, 10 }, //
{ "XL", true, 1, 2, 40 }, //
{ "L", false, 1, 2, 50 }, //
{ "XC", true, 1, 4, 90 }, //
{ "C", false, 4, 1, 100 }, //
{ "CD", true, 1, 2, 400 }, //
{ "D", false, 1, 2, 500 }, //
{ "CM", true, 1, 4, 900 }, //
{ "M", false, 4, 1, 1000 }, //
};
#define Roman_table_N (sizeof Roman_table / sizeof Roman_table[0])
const char *Roman_string_to_unsigned0(unsigned *dest, const char *src, bool subtractive){
*dest = 0;
for (unsigned i = Roman_table_N; i > 0;) {
const struct Roman_digit *digit = &Roman_table[i - 1];
if (!subtractive && digit->subtractive) {
i--;
continue;
}
unsigned limit = digit->limit; // repeat count
if (limit > 1 && subtractive) limit--;
size_t ch_length = strlen(digit->ch);
size_t next_i = i-1;
for (unsigned j=0; j<limit; j++) {
if (strncmp(src, digit->ch, ch_length) == 0) {
*dest += digit->value;
if (*dest < digit->value) { // Overflow detection
return (char*) src;
}
src += ch_length;
next_i = i - digit->nextdown; // With success, maybe skip down the list
} else {
break;
}
}
i = next_i;
}
return (char*) src;
}
Notes: Case insensitivity not yet encoded. An empty string returns 0. By this code working most-to-least significant, "XXXMMM"
does not pass.
Upvotes: 2
Reputation: 414655
By generating C code from a higher-level specification, we can get a readable solution that uses only standard C. For example, the regex:
^(?P<thousands> M{,3})
(?P<hundreds>CM|CD|D?C{,3})
(?P<tens> XC|XL|L?X{,3})
(?P<units> IX|IV|V?I{,3})$
can be represented as a FSM using Ragel finite-state machine compiler:
thousands = ('M' %{ n += 1000; }){,3};
hundreds = "CM" %{ n += 900; }
| "CD" %{ n += 400; }
| ('D' %{ n += 500; } )? ('C' %{ n += 100; }){,3};
tens = "XC" %{ n += 90; }
| "XL" %{ n += 40; }
| ('L' %{ n += 50; } )? ('X' %{ n += 10; }){,3};
units = "IX" %{ n += 9; }
| "IV" %{ n += 4; }
| ('V' %{ n += 5; } )? ('I' %{ n += 1; }){,3};
numeral = thousands hundreds tens units;
main := numeral > { n = 0; } ;
n
with zero and add thousands, hundreds, tens, and units. 100s, 10s, 1s follow a simple pattern: nine | four | (five? ten{0,3})
e.g., tens part is either 90
or 40
or optional 50
plus upto three 10
s.IIII
in addition to subtractive IV
, it is enough to replace {,3}
with {,4}
. To support Unicode/lower/upper case letters, the corresponding literals such as 'M'
could be replaced with M
where M = 'M' | 'm' | "Ⅿ" | "ⅿ";
Complete code example (with Unicode and IIII
extensions mentioned above). The generated roman_numerals.c
has no third-party dependencies.
Upvotes: 2
Reputation: 154075
Use strcmp()
or in other words, round-trip the string.
First consider the reverse problem, number --> string.
There are many ways to efficiently convert an integer to a string of Roman numerals. Let us call it:
// return false on error due to `value` range error or scant `size`
bool roman_int_to_string(char *dest, size_t size, int value);
Aside from letter case concerns, there is a one-to-one relationship between a canonical Roman number string and an int
. Simply convert the source string to an int
and then the int
into another test string. If these strings match, we have a winner.
#define ROMAN_STRING_N 20
int roman_numeral_to_int_with_validate(const char *s, int size, bool *is_canonical) {
int value = roman_numeral_to_int(s, size);
char test[ROMAN_STRING_N];
*is_canonical = roman_int_to_string(test, sizeof test, value);
if (*is_canonical) {
if (strcmp(s, test)) { // Or use a case insensitive compare as desired
*is_canonical = false;
}
}
return value;
}
Lesson: I did code up a direct validation function. To test it I needed the inverse roman_int_to_string()
. A random string generator rapidly showed many surprising errors like "CMC"
and "CMCD"
as well as OP's "IIII"
. In the end, using a simplistic string-to-int and int-to-string function pair and then doing a string compare was the most resilient.
Upvotes: 1
Reputation: 29126
The Roman digits come in two classes, the "ones" (I, X, C, M) and the "fives" (V,L, D). The "fives" can't be repeated and cannot be substracted. The "ones" can be repeated up to three times when they don't come after a smaller number and the can be subtracted from a number that isn't greater than the next "one".
When parsing, a digit can be in three different modes: It is either added normally or it is a number to be subtracted or it is a number from which the preceding number was substracted.
You can enforce these rules as you build your number. In addition to the value of a digit, you need a function that classifies the digit. In the code below, the function repeat
does this. It returns the maximum number of repetitions per number, but it also serves as classification: 3 means a "one" and 1 means a "five."
The code below seems to produce the same results as your code with regex validation. It returns a positive number for valid Roman numbers and −1 otherwise. (And it has fewer than 28 conditionals.)
int digit(int c)
{
if (c == 'I') return 1;
if (c == 'V') return 5;
if (c == 'X') return 10;
if (c == 'L') return 50;
if (c == 'C') return 100;
if (c == 'D') return 500;
if (c == 'M') return 1000;
return 0;
}
int repeat(int c)
{
if (c == 'I') return 3;
if (c == 'V') return 1;
if (c == 'X') return 3;
if (c == 'L') return 1;
if (c == 'C') return 3;
if (c == 'D') return 1;
if (c == 'M') return 3;
return 0;
}
int from_roman(const char *s)
{
int res = 0; // running result
int prev = 10000; // value of previous digit
if (s == NULL || *s == '\0') return -1;
while (*s) {
int c = *s++; // Roman digit
int count = 1; // count of consecutive numbers
int value = digit(c); // digit value
int max = repeat(c); // allowed repetitions
if (value == 0) return -1; // illegal Roman digit
while (*s == c) {
s++;
count++;
}
if (*s && digit(*s) > value) {
int next = digit(*s++);
if (max != 3) return -1; // can only subtract I, X, C
if (count > 1) return -1; // can only subtract once
if (next > 10 * value) return -1; // IM,ID, IC, IL etc. invalid
if (value * 10 > prev) return -1; // VIV, VIX etc. invalid
res += next - value;
} else {
if (count > max) return -1; // too many repetitions
if (value >= prev) return -1; // must decrease
res += count * value;
}
prev = value;
}
return res;
}
Edit: The first two drafts of my code had errors, which are now fixed.
Since the validation of correctness is done via a regular expression, another approach is to implement the regex directly, while at the same time calculating the value of the Roman number. Also, given how complicated it is to get the logic to the Roman numbers right, this might be the better approach.
An implementation of this approach could be:
/*
* Returns the length of the digit stretch and advances the pointer
*/
static int stretch(const char **s, int m, int max)
{
int n = 0;
while (n < max && **s == m) {
(*s)++;
n++;
}
return n;
}
/*
* Parses (I II III IV V VI VII VIII IX) for ones,
* tens and hundreds and advances the pointer.
*/
static int parse(const char **s, int x, int v, int i)
{
int res = 0;
if (**s == i && *(*s + 1) == x) {
res += 9;
*s += 2;
} else if (**s == i && *(*s + 1) == v) {
res += 4;
*s += 2;
} else {
res += stretch(s, v, 1) * 5;
res += stretch(s, i, 3);
}
return res;
}
/*
* Parse a Roman numeral according the the regex; -1 means failure
*/
int from_roman_regex(const char *s)
{
int res = 0;
if (s == NULL || *s == '\0') return -1;
res += stretch(&s, 'M', 3) * 1000;
res += parse(&s, 'M', 'D', 'C') * 100;
res += parse(&s, 'C', 'L', 'X') * 10;
res += parse(&s, 'X', 'V', 'I') * 1;
if (*s) return -1;
return res;
}
The stretch
function emulates regexes such as X{0,3}
; the parse
function emulates regexes such as (V?I{0,3}|IX|IV)
, but in addition to singalling matching success or failure, it evaluates it as Roman number.
The first approach tries to implement the rules of Roman numbers. This is a bit complicated, but has the advantage that one cound extend it easily to provide exact error messages if one wanted to. The second approach has the advantage that it matches the question's spec exactly: It does what the regex does.
I've tested all Roman numbers up to 3,999 and all combinations of up to 7 Roman digits. The two approaches above and the OP's approach – simple aritgmetic plus regex validation – yielded the same results for all cases.
Upvotes: 2
Reputation: 134
Simple and Sweet logic using subtract the value
ng sorry code is in a python but you can follow the logic i have used while i have done this programme**
def checkio(data):
roman=""
while(data!=0):
if data>=1000:
data-=1000
roman+='M'
elif data>=900:
data-=900
roman+='CM'
elif data>=500:
data-=500
roman+='D'
elif data>=400:
data-=400
roman+='CD'
elif data>=100:
data-=100
roman+='C'
elif data>=90:
data-=90
roman+='XC'
elif data>=50:
data-=50
roman+='L'
elif data>=40:
data-=40
roman+='XL'
elif data>=10:
data-=10
roman+='X'
elif data>=9:
data-=9
roman+='IX'
elif data>=5:
data-=5
roman+='V'
elif data>=4:
data-=4
roman+='IV'
elif data>=1:
data-=1
roman+='I'
return roman
Upvotes: 0