zemirco
zemirco

Reputation: 16395

How to implement @ManyToMany with unique constraint?

I'd like to model npm package versions and their maintainers. Have a look at the following API response.

https://registry.npmjs.org/react

The package react with version 17.0.2 has multiple maintainers.

    "maintainers": [
        {
            "name": "sebmarkbage",
            "email": "[email protected]"
        },
        {
            "name": "gaearon",
            "email": "[email protected]"
        },
        {
            "name": "acdlite",
            "email": "[email protected]"
        },
        {
            "name": "brianvaughn",
            "email": "[email protected]"
        },
        {
            "name": "fb",
            "email": "[email protected]"
        },
        {
            "name": "trueadm",
            "email": "[email protected]"
        },
        {
            "name": "sophiebits",
            "email": "[email protected]"
        },
        {
            "name": "lunaruan",
            "email": "[email protected]"
        }
    ],

A maintainer can have multiple packages, e.g. gaearon can also be the maintainer of another package.

Here is how I'm currently doing it. This my NpmPackageVersion.java.

@Entity
public class NpmPackageVersion {

  public NpmPackageVersion() {}

  @Id
  @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Long id;

  @ManyToMany(cascade = {CascadeType.PERSIST, CascadeType.MERGE})
  @JsonManagedReference
  private Set<Maintainer> maintainers = new HashSet<>();

  public void addMaintainer(Maintainer maintainer) {
    this.maintainers.add(maintainer);
    maintainer.getNpmPackageVersions().add(this);
  }

  public void removeMaintainer(Maintainer maintainer) {
    this.maintainers.remove(maintainer);
    maintainer.getNpmPackageVersions().remove(this);
  }
}

Here is my Maintainer.java.

@Entity
public class Maintainer {

  @Id
  @GeneratedValue(strategy = GenerationType.IDENTITY)
  private Integer id;

  @ManyToMany(fetch = FetchType.LAZY, mappedBy = "maintainers")
  @JsonBackReference
  private Set<NpmPackageVersion> npmPackageVersions = new HashSet<>();

  private String name;

  private String email;

  private String url;
}

Here are my database tables.

CREATE TABLE IF NOT EXISTS npm_package_version (
    id BIGSERIAL PRIMARY KEY,
    version TEXT NOT NULL
)

CREATE TABLE IF NOT EXISTS maintainer (
    id SERIAL PRIMARY KEY,
    name TEXT,
    email TEXT UNIQUE,
    url TEXT
)

CREATE TABLE IF NOT EXISTS npm_package_version_maintainers (
    npm_package_versions_id BIGINT NOT NULL REFERENCES npm_package_version,
    maintainers_id INT NOT NULL REFERENCES maintainer
)

As you can see I have a unique constraint for the maintainer email.

I'm now querying the npm registry and trying to store data in my database.

// do the http request and use a DTO, then create a new entity
var npmPackageVersion = new NpmPackageVersion();
npmPackageVersion.setVersion(version);
// add maintainers
if (value.maintainers() != null) {
  value
  .maintainers()
  .forEach(m -> {
    var maintainer = maintainerRepository.findByEmailIgnoreCase(m.email()).orElseGet(() -> {
      var inner = new Maintainer();
      inner.setEmail(m.email());
      inner.setName(m.name());
      inner.setUrl(m.url());
      return inner;
    });
    npmPackageVersion.addMaintainer(maintainer);
  });
}

// then save the new version

I'm checking if a maintainer is already in the database by using the email field from the API response. If the maintainer is already present I use it and if not I a create a new one. I associate every maintainer with the package.

When I'm trying to save the package I'm getting the following error

Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "maintainer_email_key"

I think I know why I'm getting this error. Java saves the new npm package version and at the same times tries to save all maintainers. If an already existing maintainer is part of the maintainers Set Postgres throws the error.

So my question is: How can I prevent this error? How can I tell hiberante to only save NEW maintainers and for EXISTING maintainers only add the association to the npm_package_version_maintainers table? I should not try to save an existing maintainer again to the database.


Edit 2021/06/14

I've created a demo repo to reproduce my problem.

https://github.com/zemirco/hibernate-issue

Although I added hashCode and equals to all my entities I'm still getting the same error. I think we're close but I'm still missing something. The relation is

Package -> Version -> Maintainer

So for every package has multiple versions. Every version has multiple maintainers. The same maintainer can be in multiple versions. So in the end when I try to save the Package Hibernate tries to save all versions and all maintainers. Then I'm getting the error that a maintainer might already exist in the database.

Upvotes: 0

Views: 1184

Answers (4)

mommcilo
mommcilo

Reputation: 946

Your problem is that

maintainerRepository.findByEmail(dto.email())

will return null many times for same email because your transaction is not done yet.

After that you are adding many items

var inner = new Maintainer();
inner.setEmail(dto.email());
inner.setName(dto.name());
inner.setUrl(dto.url());
maintainerRepository.save(inner);
return inner;

with same email. But this is not called before transaction is finished. And on the end of Transaction you have many calls of maintainerRepository.save(inner); which throws Unique Constraint Ex.

You face the same problem in the example from Github. You are adding same Maintener many times which will cause underlaying Hibernate mechanism to call save many times.

One solution that should work is to have separate method where you will first save Maintainers filtered by email. You should annotate that method with @Transactional(propagation = Propagation.REQUIRES_NEW)

Second method also annotated with @Transactional(propagation = Propagation.REQUIRES_NEW) will always call just findByEmail and put objects inside npmPackageVersion or whatever other object. Here all maintainers will already be saved in database.

(propagation = Propagation.REQUIRES_NEW) have some drawbacks and possible issues but what you want here as a batch job can not be easy achieved without separate transactions. Or I can not see it right now :)

@Transactional(propagation = Propagation.REQUIRES_NEW)
public void saveMainteners(Dto dto){
    if(maintenerRepository.findByEmail(dto.getEmail) == null){
       maintenerRepository.save(new Maintener(dto));
    }
}

@Transactional(propagation = Propagation.REQUIRES_NEW)
public void addMaintener(Dto dto){
    var npmPackageVersion = new NpmPackageVersion();
    npmPackageVersion.setVersion(version);
    npmPackageVersion.setDescription(value.description());
    npmPackageVersion.setHomepage(value.homepage());
    Maintainer maintainer = maintenerRepository.findByEmail(dto.getEmail);
    npmPackageVersion.addMaintainer(maintainer);
}

Upvotes: 2

jccampanero
jccampanero

Reputation: 53391

Your problem may have to do with the fact that in the Runner class you are inserting duplicates values for the maintainers when saving your NpmPackage information:

public void run(String... args) throws Exception {
  var pkg = registryService.get("json2csv");
  logger.info("pkg: {}", pkg.maintainers());


  var npmPackage = new NpmPackage();
  npmPackage.setName(pkg.name());


  var versions = pkg.versions();
  if (versions != null) {
    versions.forEach(
        (version, value) -> {
          // logger.info("value: {}", value.maintainers());


          var npmPackageVersion = new NpmPackageVersion();
          npmPackageVersion.setVersion(version);
          npmPackageVersion.setDescription(value.description());
          npmPackageVersion.setHomepage(value.homepage());
          if (value.maintainers() != null) {
            value
                .maintainers()
                .forEach(
                    dto -> {
                      // You are inserting a new maintainer for every
                      // version, an it will cause duplicates emails
                      var maintainer = new Maintainer();
                      maintainer.setEmail(dto.email());
                      maintainer.setName(dto.name());
                      maintainer.setUrl(dto.url());
                      npmPackageVersion.addMaintainer(maintainer);
                    });
          }


          npmPackage.addVersion(npmPackageVersion);
        });
  }


  npmPackageRepository.save(npmPackage);
}

As indicated in your question, you tried using a repository in order to verify if the maintainer already exists:

@Override
public void run(String... args) throws Exception {
  var pkg = registryService.get("json2csv");
  logger.info("pkg: {}", pkg.maintainers());

  var npmPackage = new NpmPackage();
  npmPackage.setName(pkg.name());

  var versions = pkg.versions();

  var maintainers = new HashMap<String, Maintainer>();

  if (versions != null) {
    versions.forEach(
        (version, value) -> {
          // logger.info("value: {}", value.maintainers());

          var npmPackageVersion = new NpmPackageVersion();
          npmPackageVersion.setVersion(version);
          npmPackageVersion.setDescription(value.description());
          npmPackageVersion.setHomepage(value.homepage());
          if (value.maintainers() != null) {
            value
                .maintainers()
                .forEach(
                    dto -> {
                      var maintainer = maintainerRepository.findByEmail(dto.email()).orElseGet(() -> {                         
                        var inner = new Maintainer();
                        inner.setEmail(dto.email());
                        inner.setName(dto.name());
                        inner.setUrl(dto.url());
                        return inner;
                      });
                      npmPackageVersion.addMaintainer(maintainer);
                    });
          }

          npmPackage.addVersion(npmPackageVersion);
        });
  }

  npmPackageRepository.save(npmPackage);
}

But it will not work: please, be aware that, when running the program, the maintainers are not still in the database, they will be persisted later when cascading de NpmPackage save operation.

I think you can achieve the desired result using a Map as a closure to see if the Maintainer has been already processed. Please, see the following, for example:

package com.example.demo;

import com.example.demo.Maintainer.Maintainer;
import com.example.demo.Maintainer.MaintainerRepository;
import com.example.demo.NpmPackage.NpmPackage;
import com.example.demo.NpmPackage.NpmPackageRepository;
import com.example.demo.NpmPackageVersion.NpmPackageVersion;
import com.example.demo.Registry.RegistryService;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;
import java.util.HashMap;

@Component
class Runner implements CommandLineRunner {

  private static final Logger logger = LoggerFactory.getLogger(DemoApplication.class);

  @Autowired private RegistryService registryService;

  @Autowired private NpmPackageRepository npmPackageRepository;

  @Override
  public void run(String... args) throws Exception {
    var pkg = registryService.get("json2csv");
    logger.info("pkg: {}", pkg.maintainers());

    var npmPackage = new NpmPackage();
    npmPackage.setName(pkg.name());

    var versions = pkg.versions();

    var maintainers = new HashMap<String, Maintainer>();

    if (versions != null) {
      versions.forEach(
          (version, value) -> {
            // logger.info("value: {}", value.maintainers());

            var npmPackageVersion = new NpmPackageVersion();
            npmPackageVersion.setVersion(version);
            npmPackageVersion.setDescription(value.description());
            npmPackageVersion.setHomepage(value.homepage());
            if (value.maintainers() != null) {
              value
                  .maintainers()
                  .forEach(
                      dto -> {
                        var maintainer = maintainers.computeIfAbsent(dto.email(), (email) -> {                        
                          var inner = new Maintainer();
                          inner.setEmail(dto.email());
                          inner.setName(dto.name());
                          inner.setUrl(dto.url());
                          return inner;
                        });
                        npmPackageVersion.addMaintainer(maintainer);
                      });
            }

            npmPackage.addVersion(npmPackageVersion);
          });
    }

    npmPackageRepository.save(npmPackage);
  }
}

Or you can save the Maintainer in the database the first time he/she is processed:

@Component
class Runner implements CommandLineRunner {

  private static final Logger logger = LoggerFactory.getLogger(DemoApplication.class);

  @Autowired private RegistryService registryService;

  @Autowired private NpmPackageRepository npmPackageRepository;
  
  @Autowired private MaintainerRepository maintainerRepository;

  @Override
  public void run(String... args) throws Exception {
    var pkg = registryService.get("json2csv");
    logger.info("pkg: {}", pkg.maintainers());

    var npmPackage = new NpmPackage();
    npmPackage.setName(pkg.name());

    var versions = pkg.versions();

    if (versions != null) {
      versions.forEach(
          (version, value) -> {
            // logger.info("value: {}", value.maintainers());

            var npmPackageVersion = new NpmPackageVersion();
            npmPackageVersion.setVersion(version);
            npmPackageVersion.setDescription(value.description());
            npmPackageVersion.setHomepage(value.homepage());
            if (value.maintainers() != null) {
              value
                  .maintainers()
                  .forEach(
                      dto -> {
                        var maintainer = maintainerRepository.findByEmail(dto.email()).orElseGet(() -> {                         
                          var inner = new Maintainer();
                          inner.setEmail(dto.email());
                          inner.setName(dto.name());
                          inner.setUrl(dto.url());

                          maintainerRepository.save(inner);

                          return inner;
                        });
                        npmPackageVersion.addMaintainer(maintainer);
                      });
            }

            npmPackage.addVersion(npmPackageVersion);
          });
    }

    npmPackageRepository.save(npmPackage);
  }
}

According to your comments, saving the Maintainers introduced a new problem related to lazy collection initialization: sorry, I tested your code before but I was unable to identify that error, perhaps you are inserting the information several times,... I do not know.

Any way, this new problem has to do with the fact that Hibernate is trying to load the npmPackageVersions collection when there is not an active transaction in place.

To solve the problem, you can define your transaction boundaries properly: with the current code, Spring Data is opening a new transaction when necessary, once per every operation performed by your repositories. As a consequence, when you fetch an existing Maintainer, it will load the entity from the database but, as the npmPackageVersions collection is lazy loaded, when you addMaintainer and you tried to include the new version among the ones associated with the maintainer, the exception is raised, because Hibernate tries loading the association without an active transaction.

First, enable Spring declarative transaction management including the appropriate annotation @EnableTransactionManagement:

package com.example.demo;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.transaction.annotation.EnableTransactionManagement;

@SpringBootApplication
@EnableTransactionManagement
public class DemoApplication {

  public static void main(String[] args) {
    SpringApplication.run(DemoApplication.class, args);
  }
}

Then, mark your Runner run method as `Transactional:

import org.springframework.transaction.annotation.Transactional;

//...

@Override
@Transactional
public void run(String... args) throws Exception {
  var pkg = registryService.get("json2csv");
  logger.info("pkg: {}", pkg.maintainers());

  var npmPackage = new NpmPackage();
  npmPackage.setName(pkg.name());

  var versions = pkg.versions();

  if (versions != null) {
    versions.forEach(
        (version, value) -> {
          // logger.info("value: {}", value.maintainers());

          var npmPackageVersion = new NpmPackageVersion();
          npmPackageVersion.setVersion(version);
          npmPackageVersion.setDescription(value.description());
          npmPackageVersion.setHomepage(value.homepage());
          if (value.maintainers() != null) {
            value
                .maintainers()
                .forEach(
                    dto -> {
                      var maintainer = maintainerRepository.findByEmail(dto.email()).orElseGet(() -> {                         
                        var inner = new Maintainer();
                        inner.setEmail(dto.email());
                        inner.setName(dto.name());
                        inner.setUrl(dto.url());

                        var persisted = maintainerRepository.save(inner);

                        return persisted;
                      });
                      npmPackageVersion.addMaintainer(maintainer);
                    });
          }

          npmPackage.addVersion(npmPackageVersion);
        });
  }

  npmPackageRepository.save(npmPackage);
}

Please, consider as well the improvement of returning the returned Maintainer instead of the one initialized when saving the data.

Upvotes: 1

Andronicus
Andronicus

Reputation: 26046

As it is usually with jpa and hibernate, the problem is the identity here. npmPackageVersion already has an instance of the type Maintainer, but when you call npmPackageVersion.addMaintainer(maintainer);, you're adding yet another instance, which is structurally equal (has the same fields), but the reference is different.

So in other words the object found in this line:

maintainerRepository.findByEmailIgnoreCase(m.email())

is already a child of npmPackageVersion, but those are two different instances, because hibernate initializes them separately.

To solve the problem you need to define a notion of identity for Maintainer (and preferably for all the entities):

@Entity
public class Maintainer {

    // fields

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Maintainer maintainer = (Maintainer) o;
        return email.equals(maintainer.getEmail());
    }

    @Override
    public int hashCode() {
        return Objects.hash(email);
    }

}

P.S.: For performance reasons it is advisable to define @JoiningTable on the entities, especially if you're managing the schema manually.

Upvotes: 1

tremendous7
tremendous7

Reputation: 751

you loop over value.maintainers() and probably create 2 new Maintainer with the same email because you check only the database. you should check for unique also the previous maintainers email in npmPackageVersion.

Upvotes: 0

Related Questions