Reputation: 69
I have a string CCAATA CCGT
that I'm trying to get fixed length, n, of contiguous subsequences. Then, I want to get something like this:
The index of each subsequences in that string. 0-3, 1-4, 2-5, etc.
0 thru 3 : CCAA
1 thru 4 : CAAT
2 thru 5 : AATA
3 thru 6 : ATAC
4 thru 7 : TACC
5 thru 8 : ACCG
6 thru 9 : CCGT
The list size is 7. Here, I am looping thru the list and getting index & lastIndexOf. After, 3 thru 6 : ATAC
, I get
Exception in thread "main" java.lang.IndexOutOfBoundsException: Index: 7, Size: 7
for (int i = 0; i < list.size(); i++) {
System.out.println(ss.indexOf(list.get(i))
+ " thru " + ss.lastIndexOf(list.get(i + n - 1)) + " : "
+ list.get(i));
Demo:
import java.util.ArrayList;
public class Subsequences {
public static void main(String[] args) {
String s = "CCAATA CCGT";
ArrayList<String> list = new ArrayList<String>(); // list of subsequence
int n = 4; // subsequences of length
String ss = s.replaceAll("\\s+", "");
String substr = null;
for (int i = 0; i <= ss.length() - n; i++) {
substr = ss.substring(i, i + n);
list.add(substr);
}
for (int i = 0; i < list.size(); i++) {
System.out.println(ss.indexOf(list.get(i))
+ " thru " + ss.lastIndexOf(list.get(i + n - 1)) + " : "
+ list.get(i));
}
}
}
Any hints?
Upvotes: 3
Views: 322
Reputation: 1368
You can do this with a simple regex as well. Remove the whitespaces and run this regex:
(?=(.{4}))
Sample:
package com.see;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
private static final String TEST_STR = "CCAATA CCGT";
public ArrayList<String> getMatchedStrings(String input) {
ArrayList<String> matches = new ArrayList<String>();
input = input.replaceAll("\\s", "");
Pattern pattern = Pattern.compile("(?=(.{4}))");
Matcher matcher = pattern.matcher(input);
while (matcher.find())
matches.add(matcher.group(1));
return matches;
}
public static void main(String[] args) {
RegexTest rt = new RegexTest();
for (String string : rt.getMatchedStrings(TEST_STR)) {
System.out.println(string);
}
}
}
Upvotes: 0
Reputation: 1890
In your loop
for (int i = 0; i < list.size(); i++) {
System.out.println(ss.indexOf(list.get(i))
+ " thru " + ss.lastIndexOf(list.get(i + n - 1))
+ " : " + list.get(i));
}
When you do list.get(i + n - 1)
and your i
is 4, the result of the addiction will be 4 + 4 - 1 = 7 and you can't get a member of a list with the same or bigger index of your list.size()
, so the system throws the Exception
To have the result you expect, you can do something like this:
import java.util.ArrayList;
public class Subsequences {
public static void main(String[] args) {
String s = "CCAATA CCGT";
ArrayList<String> list = new ArrayList<String>(); // list of subsequence
int n = 4; // subsequences of length
String ss = s.replaceAll("\\s+", "");
String substr = null;
for (int i = 0; i <= ss.length() - n; i++) {
substr = ss.substring(i, i + n);
list.add(substr);
}
// --------Here the edits-------
for (int i = 0; i < list.size(); i++)
System.println(i + " thru " + (i+n-1) + " : " + list.get(i))
// -----------------------------
}
}
Upvotes: 0
Reputation: 37604
You don't need to add n
to the lastIndexOf
since you separated the substring
by 4. Each entry in the List
consist of 4 chars. Change your index check to this
(ss.lastIndexOf(list.get(i)) + n - 1)
and finally it looks like this
for (int i = 0; i < list.size(); i++) {
System.out.println(ss.indexOf(list.get(i))
+ " thru " + (ss.lastIndexOf(list.get(i)) + n - 1) + " : "
+ list.get(i));
}
output:
0 thru 3 : CCAA
1 thru 4 : CAAT
2 thru 5 : AATA
3 thru 6 : ATAC
4 thru 7 : TACC
5 thru 8 : ACCG
6 thru 9 : CCGT
Upvotes: 1
Reputation:
Remove all whitespace, loop:
String data = "CCAATA CCGT";
String replaced = data.replaceAll("\\s", "");
for (int i = 0; i < replaced.length() - 4 + 1; i++) {
System.out.println(replaced.subSequence(i, i + 4));
}
Output:
CCAA
CAAT
AATA
ATAC
TACC
ACCG
CCGT
Upvotes: 1
Reputation: 5763
I believe your issue is in list.get(i + n - 1)
. You're currently iterating such that the start of each subsequence ranges from 0
to list.size() - 1
. The last subsequence that makes sense is the n
characters at positions list.size() - n
through list.size() - 1
.
for (int i = 0; i < list.size() - n; i++) {
System.out.println(ss.indexOf(list.get(i))
+ " thru " + ss.lastIndexOf(list.get(i + n - 1)) + " : "
+ list.get(i));
}
Upvotes: 1