Reputation: 607
So I am experiencing a bit of a logical conundrum right now. Here is the problem I am trying to solve
The Problem
I am reading in a PDB file and when it goes through the file it creates a list of all the chains in the file. The list looks something like this
chainIdList = [A, E, D, F, G, H];
The length can vary.
I have another list of all the chainIds of every residue in the sequence the data is a dictionary that I made that looks like this
chainResidue = {"chainId" : chainId, "residueNumber" : residueNumber}
chainResidue = { "A", "4"}
So what I would like to do is iterate through the list of chainResidues and check to see if the chainResidue.chainId is in the chainList. If so then create a new list of the chainId matched and then append all the residueNumbers to that list.
If that makes sense?
So in the end it would look like
A = [ 4, 6, 7, 8, ... and so on];
E = [ 9, 10];
The code so far
for (var i = 0; i < chainResidue.length; ++i) {
for (var j = 0; j < chainList.length; ++j) {
if (chainResidue[i].chainId === chainList[j]) {
//Append value of the chainResidue[i].residueName into chainList[j] make a list of lists?
}
}
}
Sample Data
ATOM 3434 CA LEU Y 17 -3.567 5.653 33.836 1.00 28.21 C
ATOM 3435 C LEU Y 17 -3.114 6.290 32.530 1.00 31.33 C
ATOM 3436 O LEU Y 17 -2.020 6.873 32.474 1.00 26.01 O
ATOM 3437 CB LEU Y 17 -2.620 4.575 34.233 1.00 29.46 C
ATOM 3438 CG LEU Y 17 -2.610 4.263 35.705 1.00 33.42 C
ATOM 3439 CD1 LEU Y 17 -1.430 3.363 35.960 1.00 40.68 C
ATOM 3440 CD2 LEU Y 17 -2.351 5.483 36.559 1.00 40.12 C
ATOM 3441 N ASP Y 18 -3.926 6.263 31.454 1.00 30.62 N
ATOM 3442 CA ASP Y 18 -3.487 6.866 30.205 1.00 31.46 C
I am just pulling in the "Y" and the number it's numbers that correspond to it like 17 and 18.
Upvotes: 0
Views: 47
Reputation: 350290
You could use this ES6 script:
// Sample data
var chainIdList = ['A', 'E', 'D', 'F', 'G', 'H'];
var chainResidue = [
{"chainId" : "A", "residueNumber" : 24},
{"chainId" : "E", "residueNumber" : 18},
{"chainId" : "A", "residueNumber" : 9},
{"chainId" : "A", "residueNumber" : 15}
];
// Create the empty lists to start with, per letter
var chainIdObj = chainIdList.reduce( (obj, id) => (obj[id] = [], obj), {} );
// Populate those lists with residue numbers
var result = chainResidue.reduce( (res, obj) => (res[obj.chainId] ? res[obj.chainId].push(obj.residueNumber) : 0, res), chainIdObj);
console.log(result);
There are two main phases:
chainIdList.reduce
iterates over the input array and for each element it calls the function provided for it. The first argument of that function is always the result of the previous call. The first time, there is no previous call, and then it then starts with the empty object ({}
) we provide as second argument to reduce
.
The function passed to reduce
looks like this:
(obj, id) => (obj[id] = [], obj)
This is in fact the newer notation, introduced by EcmaScript6 in 2015. In the "older" syntax, it would look like this:
function (obj, id) { return obj[id] = [], obj; }
The function body uses the comma operator, and together with the return
it is really equivalent to this code:
obj[id] = [];
return obj;
So, taking it all together, the value of obj
starts with {}
and then in each iteration a property is defined for it. After the first iteration it is
{ 'A': [] }
... and returned to the reduce
internals, so that it is passed as argument in the next iteration, etc. The object that is returned in the last iteration will be returned as return value of the whole reduce
call.
So now we have chainIdObj
equal to:
{
"A": [],
"E": [],
"D": [],
"F": [],
"G": [],
"H": []
}
The second phase is used to populate the arrays in the above structure. Again, it is a reduce
to iterate; this time over chainResidue
. The function that is executed for each object in chainResidue
is:
(res, obj) => (res[obj.chainId] ? res[obj.chainId].push(obj.residueNumber) : 0, res)
The first value of the first argument (res
) is this time not initialised with {}
, but with the result of the previous phase: chainIdObj
. The above function checks if the chainId
property value of the object we are looking at, matches with an entry in res
(i.e. in chainIdObj
). If so (?
) the corresponding residueNumber
is pushed to the array we just checked. In the other case (:
) nothing should happen. But as the ternary operator requires the third expression, we just put 0
: anyway the expression's value is ignored, so this is just a syntax filler.
Finally, the comma operator is again used to make sure the res
object is returned to the reduce
internals, so we get it again in the next iteration. The final result is the result of the last iteration, and it is returned by reduce
. It is assigned to result
.
That is the thing that is output in console.
Some like to avoid variable assignments where possible, and restrict the use of them as function parameters. With the above elements, you can write such code like this:
console.log( chainResidue.reduce(
(res, obj) => (res[obj.chainId] ? res[obj.chainId].push(obj.residueNumber) : 0, res),
chainIdList.reduce( (obj, id) => (obj[id] = [], obj), {} )));
Upvotes: 2
Reputation: 386624
While you already have an answer, i suggest not to use Array#reduce
, because it returns always the same object and not necessary.
// Sample data
var chainIdList = ['A', 'E', 'D', 'F', 'G', 'H'],
chainResidue = [{ chainId: "A", residueNumber: 24 }, { chainId: "E", residueNumber: 18}, { chainId: "A", residueNumber: 9 }, { chainId: "A", residueNumber: 15 }],
result = Object.create(null);
chainIdList.forEach(a => result[a] = []);
chainResidue.forEach(a => result[a.chainId] && result[a.chainId].push(a.residueNumber));
console.log(result);
Upvotes: 0