Reputation: 65
Is there an equivalent of TTree::AddFriend() with uproot ? I have 2 parallel trees in 2 different files which I'd need to read with uproot.iterate and using interpretations (setting the 'branches' option of uproot.iterate).
Maybe I can do that by manually obtaining several iterators from iterate() calls on the files, and then calling next() on each iterators... but maybe there's a simpler way akin to AddFriend ?
Thanks for any hint !
edit: I'm not sure I've been clear, so here's a bit more details. My question is not about usage of arrays, but about how to read them from different files. Here's a mockup of what I'm doing :
# I will fill this array and give it as input to my DNN
# it's very big so I will fill it in place
bigarray = ndarray( (2,numentries),...)
# get a handle on a tree, just to be able to build interpretations :
t0 = .. first tree in input_files
interpretations = dict(
a=t0['a'].interpretation.toarray(bigarray[0]),
b=t0['b'].interpretation.toarray(bigarray[1]),
)
# iterate with :
uproot.iterate( input_files, treename,
branches = interpretations )
So what if a
and b
belong to 2 trees in 2 different files ?
Upvotes: 1
Views: 774
Reputation: 514
This is how I have "friended" (befriended?) two TTree's in different files with uproot
/awkward
.
import awkward
import uproot
iterate1 = uproot.iterate(["file_with_a.root"]) # has branch "a"
iterate2 = uproot.iterate(["file_with_b.root"]) # has branch "b"
for array1, array2 in zip(iterate1, iterate2):
# join arrays
for field in array2.fields:
array1 = awkward.with_field(array1, getattr(array2, field), where=field)
# array1 now has branch "a" and "b"
print(array1.a)
print(array1.b)
Alternatively, if it is acceptable to "name" the trees,
import awkward
import uproot
iterate1 = uproot.iterate(["file_with_a.root"]) # has branch "a"
iterate2 = uproot.iterate(["file_with_b.root"]) # has branch "b"
for array1, array2 in zip(iterate1, iterate2):
# join arrays
zippedArray = awkward.zip({"tree1": array1, "tree2": array2})
# zippedArray. now has branch "tree1.a" and "tree2.b"
print(zippedArray.tree1.a)
print(zippedArray.tree2.b)
Of course you can use array1
and array2
together without merging them like this. But if you have already written code that expects only 1 Array
this can be useful.
Upvotes: 0
Reputation: 5974
In array-based programming, friends are implicit: you can JOIN
any two columns after the fact—you don't have to declare them as friends ahead of time.
In the simplest case, if your arrays a
and b
have the same length and the same order, you can just use them together, like a + b
. It doesn't matter whether a
and b
came from the same file or not. Even if I've if these is jagged (like jets.phi
) and the other is not (like met.phi
), you're still fine because the non-jagged array will be broadcasted to match the jagged one.
Note that awkward.Table
and awkward.JaggedArray.zip
can combine arrays into a single Table
or jagged Table
for bookkeeping.
If the two arrays are not in the same order, possibly because each writer was individually parallelized, then you'll need some column to act as the key associating rows of one array with different rows of the other. This is a classic database-style JOIN
and although Uproot and Awkward don't provide routines for it, Pandas does. (Look up "merging, joining, and concatenating" in the Pandas documenting—there's a lot!) You can maintain an array's jaggedness in Pandas by preparing the column with the awkward.topandas
function.
The following issue talks about a lot of these things, though the users in the issue below had to join sets of files, rather than just a single tree. (In principle, a process would have to look ahead to all the files to see which contain which keys: a distributed database problem.) Even if that's not your case, you might find more hints there to see how to get started.
https://github.com/scikit-hep/uproot/issues/314
Upvotes: 1