trungnt
trungnt

Reputation: 1111

Using multidimensional arrays to analyze sequences of RNA

I'm currently learning about multidimensional arrays and was given the task of analyzing strands of RNA sequences (given from a .txt file). Here is an example of a strand:

AUGCUUAUUAACUGAAAACAUAUGGGUAGUCGAUGA

Given this string, I am to figure out what protein this RNA strand would create. In order to do so, I am to break down each strand into codons (groups of 3). So for this exampple, I need to look at AUG CUU AUU AAC UGA, etc. Each of these codons represents an amino acid. So AUG is methionine (represented by 'M'), CUU is leucine (represented by 'L') and so on and so forth. My output should therefore be a new string of amino acids (M-L-I...)

What would be the best way to approach this problem? From my understanding, I'm to create a 3-D array, let's say

int aminoAcid[4][4][4]

Since there are 4 possible choice for each base (A,U,G,C). I'm not entirely sure where to go from here though since certain combinations will give the same amino acid.

EDIT: Am I going in the right direction if a were to first convert the string into number representations (A=0, U=1, G=2, C=3). From there I can work better with a 3d array right?

Upvotes: 0

Views: 250

Answers (1)

sunny
sunny

Reputation: 3891

You can use the 3d array to connect amino acids to different sequences. You should learn about enum and figure out how you can use enum with your array indices so that you can do something like

aminoAcid['A']['U']['G'] = 24

where 24 is also corresponding to methionine, meaning you can use another enum there. Use enums whenever you have a limited known group of items you want to represent with numbers.

It sounds like this is just the beginning of a larger project, so you should follow good practices from the start, thinking about how you can build components that represent your problem.

Upvotes: 1

Related Questions