Reputation: 35
I have a method that takes in an ArrayList of strings with each element in the list equaling to a variation of:
>AX018718 Equine influenza virus H3N8 // 4 (HA)
CAAAAGCAGGGTGACAAAAACATGATGGATTCCAACACTGTGTCAAGCTTTCAGGTAGACTGTTTTCTTT
GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
This method is broken down into the Acc, which is AX018718 in this case and seq which are the two lines following the Acc
This is then checked by another ArrayList of strings called pal to see if the substrings match [AAAATTTT, AAACGTTT, AAATATATTT]
I am able to get all of the matches for the different elements of the first list outputted as:
AATATATT in organism: AX225014 Was found in position: 15 and at 15
AATATT in organism: AX225014 Was found in position: 1432 and at 1432
AATATT in organism: AX225016 Was found in position: 1404 and at 1404
AATT in organism: AX225016 Was found in position: 169 and at 2205
Is it possible to check if for all of the outputted information if all of the Acc match one pal?
In the case above, the wanted output would be:
AATATT was found in all of the Acc.
my working code:
public static ArrayList<String> PB2Scan(ArrayList<String> Pal) throws FileNotFoundException, IOException
{
ArrayList<String> PalindromesSpotted = new ArrayList<String>();
File file = new File("IAV_PB2_32640.txt");
Scanner sc = new Scanner(file);
sc.useDelimiter(">");
//initializes the ArrayList
ArrayList<String> Gene1 = new ArrayList<String>();
//initializes the writer
FileWriter fileWriter = new FileWriter("PB2out");
PrintWriter printwriter = new PrintWriter(fileWriter);
//Loads the Array List
while(sc.hasNext()) Gene1.add(sc.next());
for(int i = 0; i < Gene1.size(); i++)
{
//Acc breaks down the title so the element:
//>AX225014 Equine influenza virus H3N8 // 1 (PB2)
//ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
//GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
//comes out as AX225014
String Acc = Accession(Gene1.get(i));
//seq takes the same element as above and returns only
//ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
//GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
String seq = trimHeader(Gene1.get(i));
for(int x = 0; x<Pal.size(); x++)
{
if(seq.contains(Pal.get(x))){
String match = (Pal.get(x) + " in organism: " + Acc + " Was found in position: "+ seq.indexOf(Pal.get(x)) + " and at " +seq.lastIndexOf(Pal.get(x)));
printwriter.println(match);
PalindromesSpotted.add(match);
}
}
}
Collections.sort(PalindromesSpotted);
return PalindromesSpotted;
}
Upvotes: 0
Views: 203
Reputation: 15028
You should probably create aMap<String, List<String>>
containing the Pals as keys and the Accs that contain them as values.
Map<String, List<String>> result = new HashMap<>();
for (String gene : Gene1) {
List<String> list = new ArrayList<>();
result.put(gene, list);
for (String pal : Pal) {
if (acc.contains(trimHeader(gene))) {
list.add(pal);
}
}
}
Now you have a Map that you can query for the Pals every Gene contains:
List<String> containedPals = result.get(gene);
This is a very reasonable result for a function like this. What you do afterwards (ie the writing into a file) should better be done in another function (that calls this one).
So, this is probably what you want to do:
List<String> genes = loadGenes(geneFile);
List<String> pals = loadPal(palFile);
Map<String, List<String>> genesToContainedPal = methodAbove(genes, pals);
switch (resultTyp) {
// ...
}
Upvotes: 1
Reputation: 9202
First off, your code won't write to any file to log the results since you don't close your writers or at the very least flush PrintWriter. As a matter of fact you don't close your reader as well. You really should close your Readers and Writers to free resources. Food for thought.
You can make your PB2Scan() method return either a simple result list as it does now, or a result list of just acc's which contain the same Pal(s), or perhaps both where a simple result list is logged and at the end of that list a list of acc's which contain the same Pal(s) which will also be logged.
Some additional code and an additional integer parameter for the PB2Scan() method would do this. For the additional parameter you might want to add something like this:
public static ArrayList<String> PB2Scan(ArrayList<String> Pal, int resultType)
throws FileNotFoundException, IOException
{ .... }
Where the integer resultType argument would take one of three integer values from 0 to 2:
You should also really have the file to read as an argument for the PB2Scan() method since this file could very easily be a different name the next go around. This makes the method more versatile rather than if the name of the file was hard-coded.
public static ArrayList<String> PB2Scan(String filePath, ArrayList<String> Pal, int resultType)
throws FileNotFoundException, IOException { .... }
The method can always write the Same output file since it would best suit what method it came from.
Using the above concept rather than writing to the output file (PB2Out.txt) as the PalindromesSpotted ArrayList is being created I think it's best to write the file after your ArrayList or ArrayLists are complete. To do this another method (writeListToFile()) is best suited to carry out the task. To find out if any same Pal's match other Acc's it is again a good idea to have yet another method (getPalMatches()) do that task.
Since the index locations of of more than one given Pal in any given Seq was not reporting properly either I have provided yet another method (findSubstringIndexes()) to quickly take care of that task.
It should be noted that the code below assumes that the Seq acquired from the trimHeader() method is all one single String with no Line Break characters within it.
The reworked PB2Scan() method and the other above mentioned methods are listed below:
The PB2Scan() Method:
public static ArrayList<String> PB2Scan(String filePath, ArrayList<String> Pal, int resultType)
throws FileNotFoundException, IOException {
// Make sure the supplied result type is either
// 0, 1, or 2. If not then default to 0.
if (resultType < 0 || resultType > 2) {
resultType = 0;
}
ArrayList<String> PalindromesSpotted = new ArrayList<>();
File file = new File(filePath);
Scanner sc = new Scanner(file);
sc.useDelimiter(">");
//initializes the ArrayList
ArrayList<String> Gene1 = new ArrayList<>();
//Loads the Array List
while (sc.hasNext()) {
Gene1.add(sc.next());
}
sc.close(); // Close the read in text file.
for (int i = 0; i < Gene1.size(); i++) {
//Acc breaks down the title so the element:
//>AX225014 Equine influenza virus H3N8 // 1 (PB2)
//ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
//GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
//comes out as AX225014
String Acc = Accession(Gene1.get(i));
//seq takes the same element as above and returns only
//ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
//GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
String seq = trimHeader(Gene1.get(i));
for (int x = 0; x < Pal.size(); x++) {
if (seq.contains(Pal.get(x))) {
String match = Pal.get(x) + " in organism: " + Acc +
" Was found in position(s): " +
findSubstringIndexes(seq, Pal.get(x));
PalindromesSpotted.add(match);
}
}
}
// If there is nothing to work with get outta here.
if (PalindromesSpotted.isEmpty()) {
return PalindromesSpotted;
}
// Sort the ArrayList
Collections.sort(PalindromesSpotted);
// Another ArrayList for matching Pal's to Acc's
ArrayList<String> accMatchingPal = new ArrayList<>();
switch (resultType) {
case 0: // if resultType is 0 is supplied
writeListToFile("PB2Out.txt", PalindromesSpotted);
return PalindromesSpotted;
case 1: // if resultType is 1 is supplied
accMatchingPal = getPalMatches(PalindromesSpotted);
writeListToFile("PB2Out.txt", accMatchingPal);
return accMatchingPal;
default: // if resultType is 2 is supplied
accMatchingPal = getPalMatches(PalindromesSpotted);
ArrayList<String> fullList = new ArrayList<>();
fullList.addAll(PalindromesSpotted);
// Create a Underline made of = signs in the list.
fullList.add(String.join("", Collections.nCopies(70, "=")));
fullList.addAll(accMatchingPal);
writeListToFile("PB2Out.txt", fullList);
return fullList;
}
}
The findSubstringIndexes() Method:
private static String findSubstringIndexes(String inputString, String stringToFind){
String indexes = "";
int index = inputString.indexOf(stringToFind);
while (index >= 0){
indexes+= (indexes.equals("")) ? String.valueOf(index) : ", " + String.valueOf(index);
index = inputString.indexOf(stringToFind, index + stringToFind.length()) ;
}
return indexes;
}
The getPalMatches() Method:
private static ArrayList<String> getPalMatches(ArrayList<String> Palindromes) {
ArrayList<String> accMatching = new ArrayList<>();
for (int i = 0; i < Palindromes.size(); i++) {
String matches = "";
String[] split1 = Palindromes.get(i).split("\\s+");
String pal1 = split1[0];
// Make sure the current Pal hasn't already been listed.
boolean alreadyListed = false;
for (int there = 0; there < accMatching.size(); there++) {
String[] th = accMatching.get(there).split("\\s+");
if (th[0].equals(pal1)) {
alreadyListed = true;
break;
}
}
if (alreadyListed) { continue; }
for (int j = 0; j < Palindromes.size(); j++) {
String[] split2 = Palindromes.get(j).split("\\s+");
String pal2 = split2[0];
if (pal1.equals(pal2)) {
// Using Ternary Operator to build the matches string
matches+= (matches.equals("")) ? pal1 + " was found in the following Accessions: "
+ split2[3] : ", " + split2[3];
}
}
if (!matches.equals("")) {
accMatching.add(matches);
}
}
return accMatching;
}
The writeListToFile() Method:
private static void writeListToFile(String filePath, ArrayList<String> list, boolean... appendToFile) {
boolean appendFile = false;
if (appendToFile.length > 0) { appendFile = appendToFile[0]; }
try {
try (BufferedWriter bw = new BufferedWriter(new FileWriter(filePath, appendFile))) {
for (int i = 0; i < list.size(); i++) {
bw.append(list.get(i) + System.lineSeparator());
}
}
} catch (IOException ex) {
ex.printStackTrace();
}
}
Upvotes: 1