Brian Cheda
Brian Cheda

Reputation: 35

Java checking if an element from a list appears in all occurrences

I have a method that takes in an ArrayList of strings with each element in the list equaling to a variation of:

>AX018718 Equine influenza virus H3N8 // 4 (HA)
CAAAAGCAGGGTGACAAAAACATGATGGATTCCAACACTGTGTCAAGCTTTCAGGTAGACTGTTTTCTTT
GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA

This method is broken down into the Acc, which is AX018718 in this case and seq which are the two lines following the Acc

This is then checked by another ArrayList of strings called pal to see if the substrings match [AAAATTTT, AAACGTTT, AAATATATTT]

I am able to get all of the matches for the different elements of the first list outputted as:

AATATATT in organism: AX225014 Was found in position: 15 and at 15
AATATT in organism: AX225014 Was found in position: 1432 and at 1432
AATATT in organism: AX225016 Was found in position: 1404 and at 1404
AATT in organism: AX225016 Was found in position: 169 and at 2205

Is it possible to check if for all of the outputted information if all of the Acc match one pal?

In the case above, the wanted output would be:

AATATT was found in all of the Acc.

my working code:

public static ArrayList<String> PB2Scan(ArrayList<String> Pal) throws FileNotFoundException, IOException
{
    ArrayList<String> PalindromesSpotted  = new ArrayList<String>();

    File file = new File("IAV_PB2_32640.txt");
    Scanner sc = new Scanner(file);
    sc.useDelimiter(">");
    //initializes the ArrayList
    ArrayList<String> Gene1 = new ArrayList<String>();
    //initializes the writer
    FileWriter fileWriter = new FileWriter("PB2out");
    PrintWriter printwriter = new PrintWriter(fileWriter);
    //Loads the Array List
    while(sc.hasNext()) Gene1.add(sc.next());
    for(int i = 0; i < Gene1.size(); i++) 
    {
    //Acc breaks down the title so the element:
        //>AX225014 Equine influenza virus H3N8 // 1 (PB2)
        //ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
        //GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
        //comes out as AX225014
    String Acc = Accession(Gene1.get(i));
    //seq takes the same element as above and returns only
    //ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
    //GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
    String seq = trimHeader(Gene1.get(i));
        for(int x = 0; x<Pal.size(); x++) 
        {
        if(seq.contains(Pal.get(x))){
        String match = (Pal.get(x) + " in organism: " + Acc + " Was found in position: "+ seq.indexOf(Pal.get(x)) + " and at " +seq.lastIndexOf(Pal.get(x)));
        printwriter.println(match);
        PalindromesSpotted.add(match);
        }
        }
    }
    Collections.sort(PalindromesSpotted);
return PalindromesSpotted;
}

Upvotes: 0

Views: 203

Answers (2)

daniu
daniu

Reputation: 15028

You should probably create aMap<String, List<String>> containing the Pals as keys and the Accs that contain them as values.

Map<String, List<String>> result = new HashMap<>();
for (String gene : Gene1) {
    List<String> list = new ArrayList<>();
    result.put(gene, list);
    for (String pal : Pal) {
        if (acc.contains(trimHeader(gene))) {
            list.add(pal);
        }
    }
}

Now you have a Map that you can query for the Pals every Gene contains:

List<String> containedPals = result.get(gene);

This is a very reasonable result for a function like this. What you do afterwards (ie the writing into a file) should better be done in another function (that calls this one).

So, this is probably what you want to do:

List<String> genes = loadGenes(geneFile);
List<String> pals = loadPal(palFile);
Map<String, List<String>> genesToContainedPal = methodAbove(genes, pals);
switch (resultTyp) {
    // ...
}

Upvotes: 1

DevilsHnd - 退した
DevilsHnd - 退した

Reputation: 9202

First off, your code won't write to any file to log the results since you don't close your writers or at the very least flush PrintWriter. As a matter of fact you don't close your reader as well. You really should close your Readers and Writers to free resources. Food for thought.

You can make your PB2Scan() method return either a simple result list as it does now, or a result list of just acc's which contain the same Pal(s), or perhaps both where a simple result list is logged and at the end of that list a list of acc's which contain the same Pal(s) which will also be logged.

Some additional code and an additional integer parameter for the PB2Scan() method would do this. For the additional parameter you might want to add something like this:

public static ArrayList<String> PB2Scan(ArrayList<String> Pal, int resultType) 
                                throws FileNotFoundException, IOException
{ .... }

Where the integer resultType argument would take one of three integer values from 0 to 2:

  • 0 - Simple result list as the code currently does now;
  • 1 - Acc's that match Pal's;
  • 2 - Simple result list and Acc's that Match Pal's at the end of result list.

You should also really have the file to read as an argument for the PB2Scan() method since this file could very easily be a different name the next go around. This makes the method more versatile rather than if the name of the file was hard-coded.

public static ArrayList<String> PB2Scan(String filePath, ArrayList<String> Pal, int resultType) 
                                throws FileNotFoundException, IOException { .... }

The method can always write the Same output file since it would best suit what method it came from.

Using the above concept rather than writing to the output file (PB2Out.txt) as the PalindromesSpotted ArrayList is being created I think it's best to write the file after your ArrayList or ArrayLists are complete. To do this another method (writeListToFile()) is best suited to carry out the task. To find out if any same Pal's match other Acc's it is again a good idea to have yet another method (getPalMatches()) do that task.

Since the index locations of of more than one given Pal in any given Seq was not reporting properly either I have provided yet another method (findSubstringIndexes()) to quickly take care of that task.

It should be noted that the code below assumes that the Seq acquired from the trimHeader() method is all one single String with no Line Break characters within it.

The reworked PB2Scan() method and the other above mentioned methods are listed below:

The PB2Scan() Method:

public static ArrayList<String> PB2Scan(String filePath, ArrayList<String> Pal, int resultType) 
                                throws FileNotFoundException, IOException {
    // Make sure the supplied result type is either 
    // 0, 1, or 2. If not then default to 0.
    if (resultType < 0 || resultType > 2) {
        resultType = 0;
    }
    ArrayList<String> PalindromesSpotted = new ArrayList<>();

    File file = new File(filePath);
    Scanner sc = new Scanner(file);
    sc.useDelimiter(">");
    //initializes the ArrayList
    ArrayList<String> Gene1 = new ArrayList<>();
    //Loads the Array List
    while (sc.hasNext()) {
        Gene1.add(sc.next());
    }
    sc.close(); // Close the read in text file.

    for (int i = 0; i < Gene1.size(); i++) {
        //Acc breaks down the title so the element:
        //>AX225014 Equine influenza virus H3N8 // 1 (PB2)
        //ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
        //GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
        //comes out as AX225014
        String Acc = Accession(Gene1.get(i));

        //seq takes the same element as above and returns only
        //ATGAAGACAACCATTATTTTGATACTACTGACCCATTGGGTCTACAGTCAAAACCCAACCAGTGGCAACA
        //GGCATGTCCGCAAACGATTTGCAGACCAAGAACTGGGTGATGCCCCATTCCTTGACCGGCTTCGCCGAGA
        String seq = trimHeader(Gene1.get(i));
        for (int x = 0; x < Pal.size(); x++) {
            if (seq.contains(Pal.get(x))) {
                String match = Pal.get(x) + " in organism: " + Acc + 
                                " Was found in position(s): " + 
                                findSubstringIndexes(seq, Pal.get(x));
                PalindromesSpotted.add(match);
            }
        }
    }

    // If there is nothing to work with get outta here.
    if (PalindromesSpotted.isEmpty()) {
        return PalindromesSpotted;
    }

    // Sort the ArrayList
    Collections.sort(PalindromesSpotted);
    // Another ArrayList for matching Pal's to Acc's
    ArrayList<String> accMatchingPal = new ArrayList<>();
    switch (resultType) {
        case 0: // if resultType is 0 is supplied
            writeListToFile("PB2Out.txt", PalindromesSpotted);
            return PalindromesSpotted;

        case 1: // if resultType is 1 is supplied
            accMatchingPal = getPalMatches(PalindromesSpotted);
            writeListToFile("PB2Out.txt", accMatchingPal);
            return accMatchingPal;

        default: // if resultType is 2 is supplied
            accMatchingPal = getPalMatches(PalindromesSpotted);
            ArrayList<String> fullList = new ArrayList<>();
            fullList.addAll(PalindromesSpotted);
            // Create a Underline made of = signs in the list.
            fullList.add(String.join("", Collections.nCopies(70, "=")));
            fullList.addAll(accMatchingPal);
            writeListToFile("PB2Out.txt", fullList);
            return fullList;
    }
}   

The findSubstringIndexes() Method:

private static String findSubstringIndexes(String inputString, String stringToFind){
    String indexes = "";
    int index = inputString.indexOf(stringToFind);
    while (index >= 0){
        indexes+= (indexes.equals("")) ? String.valueOf(index) : ", " + String.valueOf(index);
        index = inputString.indexOf(stringToFind, index + stringToFind.length())   ;
    }
    return indexes;
}

The getPalMatches() Method:

private static ArrayList<String> getPalMatches(ArrayList<String> Palindromes) {
    ArrayList<String> accMatching = new ArrayList<>();
    for (int i = 0; i < Palindromes.size(); i++) {
        String matches = "";
        String[] split1 = Palindromes.get(i).split("\\s+");
        String pal1 = split1[0];
        // Make sure the current Pal hasn't already been listed.
        boolean alreadyListed = false;
        for (int there = 0; there < accMatching.size(); there++) {
            String[] th = accMatching.get(there).split("\\s+");
            if (th[0].equals(pal1)) {
                alreadyListed = true;
                break;
            }
        }
        if (alreadyListed) { continue; }
        for (int j = 0; j < Palindromes.size(); j++) {
            String[] split2 = Palindromes.get(j).split("\\s+");
            String pal2 = split2[0];
            if (pal1.equals(pal2)) {
                // Using Ternary Operator to build the matches string
                matches+= (matches.equals("")) ? pal1 + " was found in the following Accessions: "
                        + split2[3] : ", " + split2[3];
            }
        }
        if (!matches.equals("")) {
            accMatching.add(matches);
        }
    }
    return accMatching;
}

The writeListToFile() Method:

private static void writeListToFile(String filePath, ArrayList<String> list, boolean... appendToFile) {
    boolean appendFile = false;
    if (appendToFile.length > 0) { appendFile = appendToFile[0]; }

    try {
        try (BufferedWriter bw = new BufferedWriter(new FileWriter(filePath, appendFile))) {
            for (int i = 0; i < list.size(); i++) {
                bw.append(list.get(i) + System.lineSeparator());
            }
        }
    } catch (IOException ex) {
        ex.printStackTrace();
    }
}

Upvotes: 1

Related Questions