frza
frza

Reputation: 58

OpenMDAO PetscTgtVecWrapper TypeError

I'm trying to get a parallel workflow to run in which I'm evaluating over 1000 parallel cases inside a ParallelGroup. If I run on a low amount of cores it doesn't crash, but increasing the number of nodes at some point raises an error, which indicates that it relates to how the problem is partitioned.

I'm getting an error from the deep dungeons of OpenMDAO and PETSc, relating to the target indices when setting up the communication tables as far as I can see. Below is a print of the traceback of the error:

File "/home/frza/git/OpenMDAO/openmdao/core/group.py", line 454, in _setup_vectors
impl=self._impl, alloc_derivs=alloc_derivs)
File "/home/frza/git/OpenMDAO/openmdao/core/group.py", line 1456, in _setup_data_transfer
self._setup_data_transfer(my_params, None, alloc_derivs)
File "/home/frza/git/OpenMDAO/openmdao/core/petsc_impl.py", line 125, in create_data_xfer
File "/home/frza/git/OpenMDAO/openmdao/core/petsc_impl.py", line 397, in __init__
tgt_idx_set = PETSc.IS().createGeneral(tgt_idxs, comm=comm)
File "PETSc/IS.pyx", line 74, in petsc4py.PETSc.IS.createGeneral (src/petsc4py.PETSc.c:74696)
tgt_idx_set = PETSc.IS().createGeneral(tgt_idxs, comm=comm)
File "PETSc/arraynpy.pxi", line 121, in petsc4py.PETSc.iarray (src/petsc4py.PETSc.c:8230)
TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'

this answer:

https://scicomp.stackexchange.com/questions/2355/32bit-64bit-issue-when-working-with-numpy-and-petsc4py/2356#2356

led me to look for where you set up the tgt_idxs vector to see whether its defined with the correct dtype PETSc.IntType. But so far I only get Petsc has generated inconsistent data errors when I try to set the dtype of arrays I think may be causing the error.

I've not yet tried to reinstall PETSc with --with-64-bit-indices as suggested in the answer I linked to. Do you run PETSc configured this way?

edit: I've now set up a stripped down version of the problem that replicates the error I get:

import numpy as np

from openmdao.api import Component, Group, Problem, IndepVarComp, \
                         ParallelGroup


class Model(Component):

    def __init__(self, nsec, nx, nch):
        super(Model, self).__init__()

        self.add_output('outputs', shape=[nx+1, nch*6*3*nsec])

    def solve_nonlinear(self, params, unknowns, resids):

        pass

class Aggregate(Component):

    def __init__(self, nsec, ncase, nx, nch, nsec_env=12):
        super(Aggregate, self).__init__()

        self.ncase = ncase

        for i in range(ncase):
            self.add_param('outputs_sec%03d'%i, shape=[nx+1, nch*6*3*nsec])

        for i in range(nsec):
            self.add_output('aoutput_sec%03d' % i, shape=[nsec_env, 6])


    def solve_nonlinear(self, params, unknowns, resids):

        pass


class ParModel(Group):

    def __init__(self, nsec, ncase, nx, nch, nsec_env=12):
        super(ParModel, self).__init__()

        pg = self.add('pg', ParallelGroup())

        promotes = ['aoutput_sec%03d' % i for i in range(nsec)]
        self.add('agg', Aggregate(nsec, ncase, nx, nch, nsec_env), promotes=promotes)

        for i in range(ncase):
            pg.add('case%03d' % i, Model(nsec, nx, nch))
            self.connect('pg.case%03d.outputs'%i, 'agg.outputs_sec%03d'%i)

if __name__ == '__main__':

    from openmdao.core.mpi_wrap import MPI

    if MPI:
        from openmdao.core.petsc_impl import PetscImpl as impl
    else:
        from openmdao.core.basic_impl import BasicImpl as impl

    p = Problem(impl=impl, root=Group())
    root = p.root

    root.add('dlb', ParModel(20, 1084, 36, 6))
    import time
    t0 = time.time()
    p.setup()
    print 'setup time', time.time() - t0

Having done that I can also see that the data size ends up becoming enormous due to the many cases we evaluate. I'll see if we can somehow reduce the data sizes. I can't actually get this to run at all now, since it either crashes with an error:

petsc4py.PETSc.Errorpetsc4py.PETSc.Error: error code 75
[77] VecCreateMPIWithArray() line 320 in /home/MET/Python-2.7.10_Intel/opt/petsc-3.6.2/src/vec/vec/impls/mpi/pbvec.c
[77] VecSetSizes() line 1374 in /home/MET/Python-2.7.10_Intel/opt/petsc-3.6.2/src/vec/vec/interface/vector.c
[77] Arguments are incompatible
[77] Local size 86633280 cannot be larger than global size 73393408
: error code 75

or the TypeError.

Upvotes: 0

Views: 119

Answers (2)

Bret Naylor
Bret Naylor

Reputation: 754

The data sizes that you're running with are definitely larger than can be expressed by 32 bit indices, so recompiling with --with-64-bit-indices makes sense if you're not able to decrease your data size. OpenMDAO uses PETSc.IntType internally for our indices, so they should become 64 bit in size if you recompile.

Upvotes: 1

Justin Gray
Justin Gray

Reputation: 5710

I've never used that option on petsc. A while back we did have some problems scaling up to larger numbers of cores, but we determined that the problem for us was with the OpenMPI compiling. Re-compiling OpenMDAO fixed our issues.

Since this error shows up on setup, we don't need to run to test the code. If you can provide us with the model that is showing the problem, and we can run it, then we can at least verify if the same problem is happening on our clusters.

It would be good to know how many cores you can successfully run on and at what point it breaks down too.

Upvotes: 0

Related Questions