dEmigOd
dEmigOd

Reputation: 630

Handling char integers

I have the following question. Is multiplying some character (as integer) by -1 twice guaranteed to preserve the initial value?

So, I run some code, that marks a read character by multiplying it by -1 (I'm living in an ASCII world, but any other symbol encoding would be also interesting to assess).

For example, suppose I've read an English letter 'a' into variable c. To prevent further code of detecting this is the lower letter I want to multiply it by -1. After all this is an integer.

The code is basically a back-tracking solution to some problem, so after the decision tree branch was checked, I want to return the initial value of the letter to itself.

char c;
// some code gets the value
c *= -1;
// handle something
c *= -1;

The problem I face, is more of What should I expect?, rather than Something does not work as expected.

The characters I want to transform are always English letters, may be upper and lower.

From what I see in ASCII table both classes are in the range 0..127.

So if

1) char is signed on my machine, I should expect negative letters to not be regular letters. Which is good. 2) char is unsigned. I have no clue. Should multiplication by -1 be performed in int, then truncated to char?

Since, the standard does not require symbols to be ASCII, would the result in other encodings be different?

Upvotes: 0

Views: 383

Answers (5)

J CHEN
J CHEN

Reputation: 494

See bit after ,before transform and transform back like this ,

maybe you will know what you really want

#include <stdio.h>
#include <stdlib.h>

int main(){


    int i;

    char c='a';
    int d=c;//Give char to int
    printf("%d\n",d);
    printf("c's Binary is:\n",d);
    for(i=8-1;i>-1;i--){
        printf("%d",(c&(1<<i))>>i);
    }
    printf("\n");

    printf("d's Binary is:\n",d);
    for(i=8*sizeof(int)-1;i>-1;i--){
        printf("%d",(d&(1<<i))>>i);
    }
    printf("\n");

    c*=-1;

    printf("c's Binary(after) is:\n",d);
    for(i=8-1;i>-1;i--){
        printf("%d",(c&(1<<i))>>i);
    }
    printf("\n");

    c*=-1;

    printf("c's Binary(after2) is:\n",d);
    for(i=8-1;i>-1;i--){
        printf("%d",(c&(1<<i))>>i);
    }
    printf("\n");

    c=d;

    printf("c's Binary(d back to c) is:\n",d);
    for(i=8-1;i>-1;i--){
        printf("%d",(c&(1<<i))>>i);
    }
    printf("\n");

    return 0;
}

Upvotes: 0

4386427
4386427

Reputation: 44274

Is multiplying some character (as integer) by -1 twice guaranteed to preserve the initial value?

For input in the range 0 .. 127 the answer is yes

What happens are two things:

1) Integer promotion, i.e. the char is promoted to an integer and then multiplied by -1

2) Conversion from one integer type to another, i.e. int to char

If you are on a system with signed chars, there is nothing special going on as the standard requires the range to be at least -127 .. 127

If you are on a system with unsigned chars the conversion is done by adding UCHAR_MAX+1 to the result of the multiplication to get a number that can be stored in your unsigned char.

It will look like this when we consider it done with infinite precision.

// After first multiplication by -1
-c + UCHAR_MAX + 1

// After first multiplication by -1
-(-c + UCHAR_MAX + 1) + UCHAR_MAX + 1 --> c - UCHAR_MAX - 1 + UCHAR_MAX + 1 --> c

In other words - after myltiplying with -1 twice, we again have the original value.

BTW: Notice the zero (0) is a special case where your algorithm will not work as 0 * -1 is 0, i.e. the marking will not work.

Upvotes: 1

chqrlie
chqrlie

Reputation: 144695

You are touching a very sensitive area in the C Standard: char default signedness.

As you are aware, the char type may be signed or unsigned by default on the various target platforms. This is a sad historical fact, and you should try and ensure that you program have the same behavior regardless.

The C Standard guaranties that the letters and digits are positive in the target character set so both lowercase and uppercase letters such as i and I are positive. Note however that some other characters, such as é encoded as 0xE9 in ISO-Latin-1 and the windows code page 1252 will be negative if the char type is signed (-23). Relying on negating the char values to prevent some processing is problematic since such negative char values would become positive and hence potentially undergo the transformation.

Multiplying a char value by -1 is performed using type int (or type unsigned int if char is unsigned and has the same size as int, which happens only on some rare embedded processors). If type char is smaller than int, this multiplication cannot overflow, but the result should be stored into an int to avoid the implementation defined conversion in case the value exceeds the range of char, such as would be the case if char is unsigned. Indeed in most cases negating a char value twice should yield the original value, but the C Standard does not guarantee this behavior if the intermediary value is stored into a char.

Note also that getc() returns an int with either the negative value EOF or the positive value of the byte read from the stream converted as an unsigned char.

For your approach, you should store characters as unsigned char values, either in unsigned char variables or in int variables when you want to use the negation as a trick to prevent some special handling. Adding 256 might be a safer choice as it changes '\0' as well:

// Assuming 8-bit bytes
int c = (unsigned char)some_char;
// some code gets the value
if (some_condition)
   c += 256;
// handle something
c &= 255;
// back to previous value.

Upvotes: 0

Aconcagua
Aconcagua

Reputation: 25516

Does multiplying some character (as integer) by -1 twice guaranteed to preserve the initial value?

Well, only considering this question, yes, it does; in both cases, signed or unsigned, implicit integer promotion will occur (this is mandated by the standard) and the calculation will be done in int anyway (unless unsigned char and signed int happened to have the same size, then unsigned int instead).

Signed char: x * -1 results in -x, resulting value fits into signed char, so fine, we can do the assignment (exception: -128; for this case, we get undefined behaviour! But as we can – considering ASCII – exclude this as input, we are fine...).

Unsigned char appears a bit more difficult: Again, we get -x as result, but need to place it into unsigned char. According to C standard, we need to add UCHAR_MAX + 1 that many times until the value fits into the variable, in given case, we thus get 256 - x. Second multiplication then results in x - 256 as int value, again adding 256 until the value fits into the variable (remember, x itself did so already) eliminates the negative offset...

Side note: Adding [TYPE]_MAX +1 as many times until the value fits into the variable is just cutting off the surplus most significant bits on a two's complement machine...

Solely the numerical value 0 would be problematic, as it wouldn't change its value. But as again this doesn't appear as valid input, we are fine.

Since, the standard does not require symbols to be ASCII, would the result in other encodings be different?

No, no difference at all so far – whether char is signed or unsigned.

However: How do you want to detect the values identified as invalid? With ASCII (and compatible), it is simple, all values in question (English letters only!) are in range [0; 127], you identify the invalid ones by checking for < 0 in case of signed char or > 128 in case of unsigned char. The same applies for any other encoding that only uses either lower or upper half for the letters in question (this applies even for the infamous EBCDIC encoding, solely that this time the characters in question reside in upper half of the [0; 255] range and you need to invert the checks). This simple check, though, doesn't work any more if you encounter an encoding using both halves of the byte value range (I'm not aware of any). With EBCDIC, you might get into exactly this trouble with the word separators, though: e. g. the simple space character ' ', but most punctuation marks, too, already lie in the other half than the letters (if you use such ones at all, you didn't mention...)!

Upvotes: 2

Vishaal Shankar
Vishaal Shankar

Reputation: 1658

Maybe you can try using std::bitset here. When you encounter a character, you set the corresponding bit using std::bitset::set. To check if the bit is already set, you'll have to use std::bitset::test.

Delimitations of this answer :

1.You are looking to encode your character. This answer does not do that but instead adds a std::bitset var that would help you keep track of the character you encounter.

2.If you do not reset the bit after each character, then on encountering the same character twice, it could lead to erroneous behavior.

Please find the sample code below :

#include <iostream>       // std::cout
#include <bitset>         // std::bitset

const int gAsciiLimit = 128;

int main ()
{
  std::bitset<gAsciiLimit> foo;
  char letter = 'a';
  std::size_t temp = (std::size_t)(letter);

  foo.set(temp); // will set the 97th bit to true.
  std::cout << foo.test(temp) << std::endl;
  /* Other operations handled */
  foo.set(temp,false);
  std::cout << foo.test(temp) << std::endl;
  return 0;
}

Upvotes: 0

Related Questions