Ikefactor
Ikefactor

Reputation: 27

Converting a string into 3 character substrings

I have an assignment where we read from a text file of Covid-19 sequences. I have read in the first line as a string and now have to use a substring method to break down this line into groups of 3 characters that forms a codon sequence. I am having trouble visualizing how to break this down? This is the first line of the file and every 3rd letter makes a codon. What I have now is testLine = scan.nextLine();

AGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAG

for (int i = 0; i < testLine.length(); i += 3)
        
        {
            String codon = testLine.substring(0,3);
            codonList.add(codon);
            
        }
        System.out.println(codonList);

I know I am close, the output from my code above prints the first codon AGA 20 times repeatedly. Here is the output:

[AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA, AGA]

Edit* I was able to get it with the help of everyone. The issue I am having now is replicating this for the whole file. I added a hasNext method and it doesn't seem to work the same way.

    while(scan.hasNext())
    testLine = scan.nextLine();
    for (int i = 0; i < testLine.length(); i += 3)
    {   
        String codon = testLine.substring(i, i + 3);
        codonList.add(codon);
    }
    System.out.println(codonList);  
}
Here is my output with the hasnext added: 
[ATT, AAT, TTT, AGT, AGT, GCT, ATC]

Upvotes: 0

Views: 785

Answers (4)

Unmitigated
Unmitigated

Reputation: 89414

Just use the index in the loop to substring.

String codon = testLine.substring(i, Math.min(i + 3, testLine.length()));

Demo

String#split can also be used.

System.out.println(Arrays.toString(testLine.split("(?<=\\G.{3})")));

Explanation of the regex at regex101:

enter image description here

Upvotes: 2

Bohemian
Bohemian

Reputation: 425238

Here’s a one liner:

String[] parts = testLine.split("(?<=\\G...)");

This works by splitting at points in the input that are 3 characters after the end of the last match (denoted by \G, which is initialized to start of input).

If you really need a List:

List<String> parts = Arrays.asList(testLine.split("(?<=\\G...)"));

Upvotes: 0

ssinfod
ssinfod

Reputation: 1081

It seems you were very close. You need to use i instead of 0 in the loop.

Here is my solution in C#. I know you ask Java but I had a C# IDE open...

List<string> codonList = new List<string>();
string testLine = "AGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAG";

for (int i = 0; i < testLine.Length; i += 3)    
{
    String codon = testLine.Substring(i, 3);
    codonList.Add(codon);
}

int cnt = 0;
foreach (string s in codonList)
{
    cnt++;
    if (cnt != codonList.Count)
    {
        Console.Write(s + ", ");
    }
    else
    {
        Console.WriteLine(s);
    }                
}
Console.ReadLine();

Upvotes: 0

Windson Mateus
Windson Mateus

Reputation: 71

This will work:

    String testLine = "AGATCTGTTCTCTAAACGAACTTTAAAATCTGTGTGGCTGTCACTCGGCTGCATGCTTAG";
    List<String> codonList = new ArrayList<String>();
    String newTestLine = testLine;

    for (int i = 0; i < testLine.length(); i += 3) {
        newTestLine = testLine.substring(i);
        String codon = newTestLine.substring(0, 3);
        codonList.add(codon);
    }
    System.out.println(codonList);

Upvotes: 0

Related Questions