Reputation: 16395
I'd like to model npm package versions and their maintainers. Have a look at the following API response.
https://registry.npmjs.org/react
The package react with version 17.0.2
has multiple maintainers.
"maintainers": [
{
"name": "sebmarkbage",
"email": "[email protected]"
},
{
"name": "gaearon",
"email": "[email protected]"
},
{
"name": "acdlite",
"email": "[email protected]"
},
{
"name": "brianvaughn",
"email": "[email protected]"
},
{
"name": "fb",
"email": "[email protected]"
},
{
"name": "trueadm",
"email": "[email protected]"
},
{
"name": "sophiebits",
"email": "[email protected]"
},
{
"name": "lunaruan",
"email": "[email protected]"
}
],
A maintainer can have multiple packages, e.g. gaearon
can also be the maintainer of another package.
Here is how I'm currently doing it. This my NpmPackageVersion.java
.
@Entity
public class NpmPackageVersion {
public NpmPackageVersion() {}
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
@ManyToMany(cascade = {CascadeType.PERSIST, CascadeType.MERGE})
@JsonManagedReference
private Set<Maintainer> maintainers = new HashSet<>();
public void addMaintainer(Maintainer maintainer) {
this.maintainers.add(maintainer);
maintainer.getNpmPackageVersions().add(this);
}
public void removeMaintainer(Maintainer maintainer) {
this.maintainers.remove(maintainer);
maintainer.getNpmPackageVersions().remove(this);
}
}
Here is my Maintainer.java
.
@Entity
public class Maintainer {
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Integer id;
@ManyToMany(fetch = FetchType.LAZY, mappedBy = "maintainers")
@JsonBackReference
private Set<NpmPackageVersion> npmPackageVersions = new HashSet<>();
private String name;
private String email;
private String url;
}
Here are my database tables.
CREATE TABLE IF NOT EXISTS npm_package_version (
id BIGSERIAL PRIMARY KEY,
version TEXT NOT NULL
)
CREATE TABLE IF NOT EXISTS maintainer (
id SERIAL PRIMARY KEY,
name TEXT,
email TEXT UNIQUE,
url TEXT
)
CREATE TABLE IF NOT EXISTS npm_package_version_maintainers (
npm_package_versions_id BIGINT NOT NULL REFERENCES npm_package_version,
maintainers_id INT NOT NULL REFERENCES maintainer
)
As you can see I have a unique constraint for the maintainer email.
I'm now querying the npm registry and trying to store data in my database.
// do the http request and use a DTO, then create a new entity
var npmPackageVersion = new NpmPackageVersion();
npmPackageVersion.setVersion(version);
// add maintainers
if (value.maintainers() != null) {
value
.maintainers()
.forEach(m -> {
var maintainer = maintainerRepository.findByEmailIgnoreCase(m.email()).orElseGet(() -> {
var inner = new Maintainer();
inner.setEmail(m.email());
inner.setName(m.name());
inner.setUrl(m.url());
return inner;
});
npmPackageVersion.addMaintainer(maintainer);
});
}
// then save the new version
I'm checking if a maintainer is already in the database by using the email
field from the API response. If the maintainer is already present I use it and if not I a create a new one. I associate every maintainer with the package.
When I'm trying to save the package I'm getting the following error
Caused by: org.postgresql.util.PSQLException: ERROR: duplicate key value violates unique constraint "maintainer_email_key"
I think I know why I'm getting this error. Java saves the new npm package version and at the same times tries to save all maintainers. If an already existing maintainer is part of the maintainers
Set
Postgres throws the error.
So my question is:
How can I prevent this error? How can I tell hiberante to only save NEW maintainers and for EXISTING maintainers only add the association to the npm_package_version_maintainers
table? I should not try to save an existing maintainer again to the database.
Edit 2021/06/14
I've created a demo repo to reproduce my problem.
https://github.com/zemirco/hibernate-issue
Although I added hashCode
and equals
to all my entities I'm still getting the same error. I think we're close but I'm still missing something. The relation is
Package -> Version -> Maintainer
So for every package has multiple versions. Every version has multiple maintainers. The same maintainer can be in multiple versions. So in the end when I try to save the Package Hibernate tries to save all versions and all maintainers. Then I'm getting the error that a maintainer might already exist in the database.
Upvotes: 0
Views: 1184
Reputation: 946
Your problem is that
maintainerRepository.findByEmail(dto.email())
will return null many times for same email because your transaction is not done yet.
After that you are adding many items
var inner = new Maintainer();
inner.setEmail(dto.email());
inner.setName(dto.name());
inner.setUrl(dto.url());
maintainerRepository.save(inner);
return inner;
with same email. But this is not called before transaction is finished. And on the end of Transaction you have many calls of maintainerRepository.save(inner);
which throws Unique Constraint Ex.
You face the same problem in the example from Github. You are adding same Maintener many times which will cause underlaying Hibernate mechanism to call save many times.
One solution that should work is to have separate method where you will first save Maintainers filtered by email. You should annotate that method with @Transactional(propagation = Propagation.REQUIRES_NEW)
Second method also annotated with @Transactional(propagation = Propagation.REQUIRES_NEW)
will always call just findByEmail
and put objects inside npmPackageVersion
or whatever other object. Here all maintainers will already be saved in database.
(propagation = Propagation.REQUIRES_NEW)
have some drawbacks and possible issues but what you want here as a batch job can not be easy achieved without separate transactions. Or I can not see it right now :)
@Transactional(propagation = Propagation.REQUIRES_NEW)
public void saveMainteners(Dto dto){
if(maintenerRepository.findByEmail(dto.getEmail) == null){
maintenerRepository.save(new Maintener(dto));
}
}
@Transactional(propagation = Propagation.REQUIRES_NEW)
public void addMaintener(Dto dto){
var npmPackageVersion = new NpmPackageVersion();
npmPackageVersion.setVersion(version);
npmPackageVersion.setDescription(value.description());
npmPackageVersion.setHomepage(value.homepage());
Maintainer maintainer = maintenerRepository.findByEmail(dto.getEmail);
npmPackageVersion.addMaintainer(maintainer);
}
Upvotes: 2
Reputation: 53391
Your problem may have to do with the fact that in the Runner
class you are inserting duplicates values for the maintainers when saving your NpmPackage
information:
public void run(String... args) throws Exception {
var pkg = registryService.get("json2csv");
logger.info("pkg: {}", pkg.maintainers());
var npmPackage = new NpmPackage();
npmPackage.setName(pkg.name());
var versions = pkg.versions();
if (versions != null) {
versions.forEach(
(version, value) -> {
// logger.info("value: {}", value.maintainers());
var npmPackageVersion = new NpmPackageVersion();
npmPackageVersion.setVersion(version);
npmPackageVersion.setDescription(value.description());
npmPackageVersion.setHomepage(value.homepage());
if (value.maintainers() != null) {
value
.maintainers()
.forEach(
dto -> {
// You are inserting a new maintainer for every
// version, an it will cause duplicates emails
var maintainer = new Maintainer();
maintainer.setEmail(dto.email());
maintainer.setName(dto.name());
maintainer.setUrl(dto.url());
npmPackageVersion.addMaintainer(maintainer);
});
}
npmPackage.addVersion(npmPackageVersion);
});
}
npmPackageRepository.save(npmPackage);
}
As indicated in your question, you tried using a repository in order to verify if the maintainer already exists:
@Override
public void run(String... args) throws Exception {
var pkg = registryService.get("json2csv");
logger.info("pkg: {}", pkg.maintainers());
var npmPackage = new NpmPackage();
npmPackage.setName(pkg.name());
var versions = pkg.versions();
var maintainers = new HashMap<String, Maintainer>();
if (versions != null) {
versions.forEach(
(version, value) -> {
// logger.info("value: {}", value.maintainers());
var npmPackageVersion = new NpmPackageVersion();
npmPackageVersion.setVersion(version);
npmPackageVersion.setDescription(value.description());
npmPackageVersion.setHomepage(value.homepage());
if (value.maintainers() != null) {
value
.maintainers()
.forEach(
dto -> {
var maintainer = maintainerRepository.findByEmail(dto.email()).orElseGet(() -> {
var inner = new Maintainer();
inner.setEmail(dto.email());
inner.setName(dto.name());
inner.setUrl(dto.url());
return inner;
});
npmPackageVersion.addMaintainer(maintainer);
});
}
npmPackage.addVersion(npmPackageVersion);
});
}
npmPackageRepository.save(npmPackage);
}
But it will not work: please, be aware that, when running the program, the maintainers are not still in the database, they will be persisted later when cascading de NpmPackage
save
operation.
I think you can achieve the desired result using a Map
as a closure to see if the Maintainer
has been already processed. Please, see the following, for example:
package com.example.demo;
import com.example.demo.Maintainer.Maintainer;
import com.example.demo.Maintainer.MaintainerRepository;
import com.example.demo.NpmPackage.NpmPackage;
import com.example.demo.NpmPackage.NpmPackageRepository;
import com.example.demo.NpmPackageVersion.NpmPackageVersion;
import com.example.demo.Registry.RegistryService;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;
import java.util.HashMap;
@Component
class Runner implements CommandLineRunner {
private static final Logger logger = LoggerFactory.getLogger(DemoApplication.class);
@Autowired private RegistryService registryService;
@Autowired private NpmPackageRepository npmPackageRepository;
@Override
public void run(String... args) throws Exception {
var pkg = registryService.get("json2csv");
logger.info("pkg: {}", pkg.maintainers());
var npmPackage = new NpmPackage();
npmPackage.setName(pkg.name());
var versions = pkg.versions();
var maintainers = new HashMap<String, Maintainer>();
if (versions != null) {
versions.forEach(
(version, value) -> {
// logger.info("value: {}", value.maintainers());
var npmPackageVersion = new NpmPackageVersion();
npmPackageVersion.setVersion(version);
npmPackageVersion.setDescription(value.description());
npmPackageVersion.setHomepage(value.homepage());
if (value.maintainers() != null) {
value
.maintainers()
.forEach(
dto -> {
var maintainer = maintainers.computeIfAbsent(dto.email(), (email) -> {
var inner = new Maintainer();
inner.setEmail(dto.email());
inner.setName(dto.name());
inner.setUrl(dto.url());
return inner;
});
npmPackageVersion.addMaintainer(maintainer);
});
}
npmPackage.addVersion(npmPackageVersion);
});
}
npmPackageRepository.save(npmPackage);
}
}
Or you can save
the Maintainer
in the database the first time he/she is processed:
@Component
class Runner implements CommandLineRunner {
private static final Logger logger = LoggerFactory.getLogger(DemoApplication.class);
@Autowired private RegistryService registryService;
@Autowired private NpmPackageRepository npmPackageRepository;
@Autowired private MaintainerRepository maintainerRepository;
@Override
public void run(String... args) throws Exception {
var pkg = registryService.get("json2csv");
logger.info("pkg: {}", pkg.maintainers());
var npmPackage = new NpmPackage();
npmPackage.setName(pkg.name());
var versions = pkg.versions();
if (versions != null) {
versions.forEach(
(version, value) -> {
// logger.info("value: {}", value.maintainers());
var npmPackageVersion = new NpmPackageVersion();
npmPackageVersion.setVersion(version);
npmPackageVersion.setDescription(value.description());
npmPackageVersion.setHomepage(value.homepage());
if (value.maintainers() != null) {
value
.maintainers()
.forEach(
dto -> {
var maintainer = maintainerRepository.findByEmail(dto.email()).orElseGet(() -> {
var inner = new Maintainer();
inner.setEmail(dto.email());
inner.setName(dto.name());
inner.setUrl(dto.url());
maintainerRepository.save(inner);
return inner;
});
npmPackageVersion.addMaintainer(maintainer);
});
}
npmPackage.addVersion(npmPackageVersion);
});
}
npmPackageRepository.save(npmPackage);
}
}
According to your comments, saving the Maintainer
s introduced a new problem related to lazy collection initialization: sorry, I tested your code before but I was unable to identify that error, perhaps you are inserting the information several times,... I do not know.
Any way, this new problem has to do with the fact that Hibernate is trying to load the npmPackageVersions
collection when there is not an active transaction in place.
To solve the problem, you can define your transaction boundaries properly: with the current code, Spring Data is opening a new transaction when necessary, once per every operation performed by your repositories. As a consequence, when you fetch an existing Maintainer
, it will load the entity from the database but, as the npmPackageVersions
collection is lazy loaded, when you addMaintainer
and you tried to include the new version among the ones associated with the maintainer, the exception is raised, because Hibernate tries loading the association without an active transaction.
First, enable Spring declarative transaction management including the appropriate annotation @EnableTransactionManagement
:
package com.example.demo;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.transaction.annotation.EnableTransactionManagement;
@SpringBootApplication
@EnableTransactionManagement
public class DemoApplication {
public static void main(String[] args) {
SpringApplication.run(DemoApplication.class, args);
}
}
Then, mark your Runner
run
method as `Transactional:
import org.springframework.transaction.annotation.Transactional;
//...
@Override
@Transactional
public void run(String... args) throws Exception {
var pkg = registryService.get("json2csv");
logger.info("pkg: {}", pkg.maintainers());
var npmPackage = new NpmPackage();
npmPackage.setName(pkg.name());
var versions = pkg.versions();
if (versions != null) {
versions.forEach(
(version, value) -> {
// logger.info("value: {}", value.maintainers());
var npmPackageVersion = new NpmPackageVersion();
npmPackageVersion.setVersion(version);
npmPackageVersion.setDescription(value.description());
npmPackageVersion.setHomepage(value.homepage());
if (value.maintainers() != null) {
value
.maintainers()
.forEach(
dto -> {
var maintainer = maintainerRepository.findByEmail(dto.email()).orElseGet(() -> {
var inner = new Maintainer();
inner.setEmail(dto.email());
inner.setName(dto.name());
inner.setUrl(dto.url());
var persisted = maintainerRepository.save(inner);
return persisted;
});
npmPackageVersion.addMaintainer(maintainer);
});
}
npmPackage.addVersion(npmPackageVersion);
});
}
npmPackageRepository.save(npmPackage);
}
Please, consider as well the improvement of returning the returned Maintainer
instead of the one initialized when saving the data.
Upvotes: 1
Reputation: 26046
As it is usually with jpa and hibernate, the problem is the identity here. npmPackageVersion
already has an instance of the type Maintainer
, but when you call npmPackageVersion.addMaintainer(maintainer);
, you're adding yet another instance, which is structurally equal (has the same fields), but the reference is different.
So in other words the object found in this line:
maintainerRepository.findByEmailIgnoreCase(m.email())
is already a child of npmPackageVersion
, but those are two different instances, because hibernate initializes them separately.
To solve the problem you need to define a notion of identity for Maintainer
(and preferably for all the entities):
@Entity
public class Maintainer {
// fields
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Maintainer maintainer = (Maintainer) o;
return email.equals(maintainer.getEmail());
}
@Override
public int hashCode() {
return Objects.hash(email);
}
}
P.S.: For performance reasons it is advisable to define @JoiningTable
on the entities, especially if you're managing the schema manually.
Upvotes: 1
Reputation: 751
you loop over value.maintainers() and probably create 2 new Maintainer with the same email because you check only the database. you should check for unique also the previous maintainers email in npmPackageVersion.
Upvotes: 0