Reputation: 7
I am trying to write a program which calculates and prints the GC content of a string of DNA(which is input through a txt file). That is, the percentage of G's and C's in a string of DNA. Here is my function for the GC percentage:
void updateGCCount(char s[], int *gc, int *at) {
char c[MAXLENGTH];
int i,GCcount,ATcount;
float len,GCpercentage;
GCcount=0;
ATcount=0;
for(i=0;c[i]!='\0';++i)
{
if(c[i]=='G' || c[i]=='C')
{
++GCcount;
*gc=GCcount;
}
if(c[i]=='A' || c[i]=='T')
{
++ATcount;
*at=ATcount;
}
}
strcpy(c,s);
len=strlen(c);
GCpercentage=*gc/len;
printf("GC-content: %.2f\n",GCpercentage);
}
This is my function definition, and the part which is supposed to correctly print the GC percentage is what I am not sure about. Below is my main program which utilizes the input text file.
#include "genomics.h"
int main(){
char s[MAXLENGTH];
int gc, at;
scanf("%s",s);
printf("Sequence : %s\n",s);
updateGCCount(s, &gc, &at);
return 0;
}
Any help or advice on why I am not getting a correct value for the GCpercentage would be great. Thank you in advance
Upvotes: 0
Views: 99
Reputation: 12514
That's a strongly un-idiomatic program. Consider the following.
#include <stdio.h>
#include <stdlib.h> /* for exit(3) */
float count_gc(const char* s)
{
You have no need to pass information back via variables passed in by reference. Functions return values -- typically 'the answer'.
You're simply scanning the content of the argument string s
, so there's no need to copy it anywhere.
As others have pointed out, you were scanning the contents of the array c[]
before you copied anything in to it -- you were counting 'G' and 'C' in a (probably large) random block of memory. Keeping things simple avoids mistakes like that.
int nvalid = 0;
int gccount = 0;
float result;
for (; *s != '\0'; s++) {
Although the for
loop you wrote isn't wrong, it's somewhat un-idiomatic. Here, we examine the character pointed to by the pointer s
, and then increment the pointer, until we find ourselves pointing at the \0
that terminates the string. Yes, this means we 'lose' the initial value of the argument, but we don't need it after the loop, so that doesn't matter.
switch (*s) {
A switch is a more natural construction here. You're looking for a small set of possible values that *s
(that is, the character the pointer is currently pointing at) may have.
case 'G':
case 'C':
nvalid++;
gccount++;
break;
case 'A':
case 'T':
nvalid++;
break;
default:
/* unexpected character -- ignore it */
break;
Every switch
statement should have a default
clause -- one should always think of what's supposed to happen if none of the case
clauses match. In this case, we just ignore this character.
}
}
if (nvalid == 0) {
fprintf(stderr, "No valid letters found!\n");
result = 0.0;
} else {
/* Multiply by 1.0 to convert integer gccount to a float */
result = 1.0*gccount / nvalid;
}
return result;
We return the result to the caller rather than printing it out inside the function. Functions shouldn't 'chatter', but leave all of the I/O in one place, typically leaving the main function (or something higher up) to look after that.
}
int main(int argc, char** argv)
{
if (argc != 2) {
/* Give the user a hint on how to call the program */
fprintf(stderr, "Usage: gcat <string>\n");
exit(1);
}
printf("Sequence GC-content = %g\n", count_gc(argv[1]));
}
I run that with:
% cc -o gcat gcat.c
% ./gcat "GCAT ATx foo"
Sequence GC-content = 0.333333
%
With C, it's very easy to tie yourself in knots, very quickly. Aim for simplicity always.
Upvotes: 0
Reputation: 2779
c
is not initialize, so *gc
and *at
are not updated at all and they contain garbage..
here you should use s
instead of c
for(i=0;c[i]!='\0';++i)
{
if(c[i]=='G' || c[i]=='C')
{
++GCcount;
*gc=GCcount;
}
if(c[i]=='A' || c[i]=='T')
{
++ATcount;
*at=ATcount;
}
}
Upvotes: 1
Reputation: 2133
You're doing your tests on char array "c":
char c[MAXLENGTH];
...
for(i=0;c[i]!='\0';++i)
{
if(c[i]=='G' || c[i]=='C')
{
++GCcount;
*gc=GCcount;
}
if(c[i]=='A' || c[i]=='T')
{
++ATcount;
*at=ATcount;
}
}
If should be on s, the array that you passed in. The c array is probably superflous, you should be able to get the length from s as well
Upvotes: 1