Reputation: 373

On Linux, \n being treated as two characters in gdb using gcc compiler

My code

char[] fileContents = "hi\n whats up\n";

    char *output=malloc(sizeof(char)*1024) ;

    int i = 0; int j = 0;
    char *bPtr = fileContents;


    for(i=j=0; bPtr[i]!='\0'; i++)
    {
      if('\n'==bPtr[i])
            outputPtr[j++]='\r';
            outputPtr[j++]=bPtr[i];
    }

On netbeans, this code works but using linux gcc, the \ and the \n are being treated as seperate characters, where as in net beans \n is all one char. plz help

Upon debugging, in Linux using GDB it completely skips the if statement, while in netbeans it enters and gets the job done.

Upvotes: 0

Answers (2)

Sean Conner

Reputation: 416

First off, your C code isn't C code. It's close, but as is, it won't compile at all. Second, after cleaning up the code to get it to a compilable state:

#include <stdio.h>
#include <stdlib.h>

char fileContents[] = "hi\n whats up\n";

int main(void)
{
  char *output;
  int   i;
  int   j;
  char *bPtr;

  output = malloc(1024);
  bPtr   = fileContents;

  for (i = j = 0 ; bPtr[i] != '\0' ; i++)
  {
    if ('\n' == bPtr[i])
      output[j++] = '\r';
    output[j++] = bPtr[i];
  }

  output[j] = '\0';
  fputs(output,stdout);
  return EXIT_SUCCESS;
}

And compiling with "gcc -g a.c" and using gdb:

GNU gdb Red Hat Linux (6.3.0.0-1.132.EL4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db
library "/lib/tls/libthread_db.so.1".

(gdb) break 17
Breakpoint 1 at 0x80483fa: file a.c, line 17.
(gdb) run
Starting program: /tmp/a.out 

Breakpoint 1, main () at a.c:17
17        for (i = j = 0 ; bPtr[i] != '\0' ; i++)
(gdb) n
19          if ('\n' == bPtr[i])
(gdb) n
21          output[j++] = bPtr[i];
(gdb) n
17        for (i = j = 0 ; bPtr[i] != '\0' ; i++)
(gdb) n
19          if ('\n' == bPtr[i])
(gdb) n
21          output[j++] = bPtr[i];
(gdb) n
17        for (i = j = 0 ; bPtr[i] != '\0' ; i++)
(gdb) n
19          if ('\n' == bPtr[i])
(gdb) n
20            output[j++] = '\r';
(gdb) n
21          output[j++] = bPtr[i];

The first two times through the loop, we skip over the condition, since it's false. On the third time through, the condition is met, and the "\r" is included in the output.

But from reading some of your other comments, it seems you are confused by line endings. On Unix (and because Linux is a type of Unix, this is true for Linux as well), lines end with one character, LF (ASCII code 10). Windows (and MS-DOS, the precursor to Windows, and CP/M, the precursor to MS-DOS) uses the character sequence CR LF (ASCII code 13, ASCII code 10) to mark the end of line.

Why the two differing standards? Because of the wording of the ASCII standard, when it was created and why. Back when it was created, output was mostly on teletypes---think typewriter. CR was defined as moving the print carriage (or print head) back to the begining of the line, and LF was defined as advancing to the next line. The action of bringing the print carriage to the beginning of the next line was unspecified. CP/M (and descendants) standardized on using both to mark the end of a line due to a rather literal translation of the standards document. The creators of Unix decided on a more liberal interpretation where LF, a Line Feed, meant to advance to the next line for output, bringing the print carriage back to the start (whereas the first computer I used used CR for the same thing, bring the carriage back to the start and advance to the next line).

Now, if a teletype device is hooked up to a Unix system and requires both CR and LF, then it's up to the Unix device driver, when it sees a LF, to add the required CR. In other words, the system handles the details in behalf of your program, and you only need the LF to end a line.

To further confound the mess, the C standard weighs in. When you open a file,

FILE *fp = fopen("sometextfile.txt","r");

you open it in "text" mode. Under Unix, this does nothing, but under Windows, the C library will discard "\r" on input so the program only needs to concern itself with looking for "\n" (and for files opened for writing, it will add the CR when a LF is seen). But this is under Windows (there may be other systems out there that do this, but I am unfamiliar with any).

If you really want to see the file, as is, you need to open it in binary mode:

FILE *fp = fopen("sometextfile.txt","rb");

Now, if there are any CRs in the file, your program will see them. Normally, one doesn't need to concern themselves with line endings---it's only when you move a text file from one system to another that uses a different line-ending convention where it becomes an issue, and even then, the transport mechanism might take care of the issue for you (such as FTP). But it doesn't hurt to check.

Remember when I said that Unix does not make a distinction between "text" and "binary" modes? It doesn't. So a text file from the Windows world is processed with a Unix program, said Unix program will see the CRs. What happens is really up to the program in question. Programs like grep don't seem to care, but the editor I uses will show any CR that exists.

So I guess now, my question is---what are you trying to do?

Upvotes: 1

asheeshr

Reputation: 4114

Your code runs perfectly on my system with gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3.

The code I ran :

int main()
{
    char Contents[] = "hi\n whats up\n";

    int i = 0; int j = 0;
    char outputPtr[20];
    for(i=j=0; Contents[i]!='\0'; i++)
    {
      if('\n'==Contents[i]) outputPtr[j++]='\r';
      outputPtr[j++]=Contents[i];
    }
    outputPtr[j]='\0';
    printf("%s %d %d \n", outputPtr,j,i);
    i = 0;
    while(outputPtr[i]!='\0')   printf(" %d ", outputPtr[i++]);
    return 0;
}

Output :

hi
 whats up
 15 13 //Length of the edited string and the original string
 104  105  13  10  32  119  104  97  116  115  32  117  112  13  10 //Ascii values of the characters of the string

13 is the carriage return character.

Upvotes: 0

On Linux, \n being treated as two characters in gdb using gcc compiler

Answers (2)

Related Questions