Grouping up a data to separate lists

Question

So I am experiencing a bit of a logical conundrum right now. Here is the problem I am trying to solve

The Problem

I am reading in a PDB file and when it goes through the file it creates a list of all the chains in the file. The list looks something like this

 chainIdList = [A, E, D, F, G, H];

The length can vary.

I have another list of all the chainIds of every residue in the sequence the data is a dictionary that I made that looks like this

chainResidue = {"chainId" : chainId, "residueNumber" : residueNumber}
chainResidue = { "A", "4"}

So what I would like to do is iterate through the list of chainResidues and check to see if the chainResidue.chainId is in the chainList. If so then create a new list of the chainId matched and then append all the residueNumbers to that list.

If that makes sense?

So in the end it would look like

A = [ 4, 6, 7, 8, ... and so on];
E = [ 9, 10];

The code so far

for (var i = 0; i < chainResidue.length; ++i) {
    for (var j = 0; j < chainList.length; ++j) {
        if (chainResidue[i].chainId === chainList[j]) {
           //Append value of the chainResidue[i].residueName into chainList[j] make a list of lists?
        }
    }
 }

Sample Data

ATOM   3434  CA  LEU Y  17      -3.567   5.653  33.836  1.00 28.21           C  
ATOM   3435  C   LEU Y  17      -3.114   6.290  32.530  1.00 31.33           C  
ATOM   3436  O   LEU Y  17      -2.020   6.873  32.474  1.00 26.01           O  
ATOM   3437  CB  LEU Y  17      -2.620   4.575  34.233  1.00 29.46           C  
ATOM   3438  CG  LEU Y  17      -2.610   4.263  35.705  1.00 33.42           C  
ATOM   3439  CD1 LEU Y  17      -1.430   3.363  35.960  1.00 40.68           C  
ATOM   3440  CD2 LEU Y  17      -2.351   5.483  36.559  1.00 40.12           C  
ATOM   3441  N   ASP Y  18      -3.926   6.263  31.454  1.00 30.62           N  
ATOM   3442  CA  ASP Y  18      -3.487   6.866  30.205  1.00 31.46           C

I am just pulling in the "Y" and the number it's numbers that correspond to it like 17 and 18.

trincot · Accepted Answer

You could use this ES6 script:

// Sample data
var chainIdList = ['A', 'E', 'D', 'F', 'G', 'H'];
var chainResidue = [
  {"chainId" : "A", "residueNumber" : 24},
  {"chainId" : "E", "residueNumber" : 18},
  {"chainId" : "A", "residueNumber" : 9},
  {"chainId" : "A", "residueNumber" : 15}
];

// Create the empty lists to start with, per letter
var chainIdObj = chainIdList.reduce( (obj, id) => (obj[id] = [], obj), {} );

// Populate those lists with residue numbers
var result = chainResidue.reduce( (res, obj) => (res[obj.chainId] ? res[obj.chainId].push(obj.residueNumber) : 0, res), chainIdObj); 

console.log(result);

Explanation of the code

There are two main phases:

Create an object that has a property for every letter in the input array. The property values are all set to empty arrays (since we have no processed anything yet).

chainIdList.reduce iterates over the input array and for each element it calls the function provided for it. The first argument of that function is always the result of the previous call. The first time, there is no previous call, and then it then starts with the empty object ({}) we provide as second argument to reduce.

The function passed to reduce looks like this:

(obj, id) => (obj[id] = [], obj)

This is in fact the newer notation, introduced by EcmaScript6 in 2015. In the "older" syntax, it would look like this:

function (obj, id) { return obj[id] = [], obj; }

The function body uses the comma operator, and together with the return it is really equivalent to this code:

 obj[id] = [];
 return obj;

So, taking it all together, the value of obj starts with {} and then in each iteration a property is defined for it. After the first iteration it is

 { 'A': [] }

... and returned to the reduce internals, so that it is passed as argument in the next iteration, etc. The object that is returned in the last iteration will be returned as return value of the whole reduce call.

So now we have chainIdObj equal to:

{
  "A": [],
  "E": [],
  "D": [],
  "F": [],
  "G": [],
  "H": []
}

The second phase is used to populate the arrays in the above structure. Again, it is a reduce to iterate; this time over chainResidue. The function that is executed for each object in chainResidue is:
```
(res, obj) => (res[obj.chainId] ? res[obj.chainId].push(obj.residueNumber) : 0, res)
```

The first value of the first argument (res) is this time not initialised with {}, but with the result of the previous phase: chainIdObj. The above function checks if the chainId property value of the object we are looking at, matches with an entry in res (i.e. in chainIdObj). If so (?) the corresponding residueNumber is pushed to the array we just checked. In the other case (:) nothing should happen. But as the ternary operator requires the third expression, we just put 0: anyway the expression's value is ignored, so this is just a syntax filler.

Finally, the comma operator is again used to make sure the res object is returned to the reduce internals, so we get it again in the next iteration. The final result is the result of the last iteration, and it is returned by reduce. It is assigned to result.

That is the thing that is output in console.

Functional Code

Some like to avoid variable assignments where possible, and restrict the use of them as function parameters. With the above elements, you can write such code like this:

console.log( chainResidue.reduce( 
   (res, obj) => (res[obj.chainId] ? res[obj.chainId].push(obj.residueNumber) : 0, res), 
                 chainIdList.reduce( (obj, id) => (obj[id] = [], obj), {} )));

Grouping up a data to separate lists

Answers (2)

Explanation of the code

Functional Code

Related Questions