Reputation: 1
https://projectnessie.org/nessie-0-99-0/gc/
Attempting to use this tool to expire older snapshots for Apache Iceberg tables. We use Nessie as catalog.
java -jar nessie-gc.jar gc --cutoff reference-name-regex=P15D -u ${nessie_url_with_api_v2} --jdbc-url jdbc:postgresql://xxxx:5432/nessie_gc --jdbc-user xxx --jdbc-password xxxxx
tool reports :-
2024-10-10 20:48:24,922 [ForkJoinPool-3-worker-1] INFO o.p.g.expire.PerContentDeleteExpired - live-set#d947f132-6ada-4357-beba-a6d6c3a01f85 content#04234342-2242-47ff-8fa0-8e68a74cfbf4: Found 1995 total files in base location s3://s3.sbox.us-east-1.source.jsc.jamf/jsc/request_logs/, 0 files considered expired, 1995 files considered live, 0 files are newer than max-file-modification-time. 2024-10-10 20:48:24,950 [ForkJoinPool-3-worker-1] INFO o.p.gc.iceberg.IcebergContentToFiles - Table metadata s3://s3.sbox.us-east-1.dms.jamf/dms_source_db/demo_accounts/metadata/00000-b97d2dec-26d9-4cad-9fcf-a0895567421a.metadata.json for snapshot ID 1989948165265808265 for content-key dms_source_db.demo_accounts at Nessie commit 60782962a0454ca860040b481f51c2abb540c2e9ab50803ca5ab2ec270aff5b0 does not exist, probably already deleted, assuming no files 2024-10-10 20:48:24,972 [ForkJoinPool-3-worker-1] INFO o.p.gc.iceberg.IcebergContentToFiles - Table metadata s3://s3.sbox.us-east-1.dms.jamf/dms_source_db/demo_accounts/metadata/00001-a2285c43-c27b-4882-92d2-82e4c3d33e6b.metadata.json for snapshot ID 3291950617892779911 for content-key dms_source_db.demo_accounts at Nessie commit 9be72874509fd7e7139ece9c467b91ceae5dc878737285b744be4a43e08ad4fe does not exist, probably already deleted, assuming no files
Not sure why, it is not detecting any files to be deleted ? We have few snapshots( hence s3 metadata file/parquet file older than one year) older than 9 months .
In the end tool dies with below exception
2024-10-10 20:29:27,402 [idle-connection-reaper] DEBUG s.a.a.h.a.i.c.IdleConnectionReaper - Shutting down reaper thread. java.lang.NullPointerException at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.base/java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:481) at java.base/java.util.concurrent.ForkJoinTask.getThrowableException(ForkJoinTask.java:564) at java.base/java.util.concurrent.ForkJoinTask.reportException(ForkJoinTask.java:591) at java.base/java.util.concurrent.ForkJoinTask.joinForPoolInvoke(ForkJoinTask.java:1042) at java.base/java.util.concurrent.ForkJoinPool.invoke(ForkJoinPool.java:2639) at org.projectnessie.gc.expire.local.DefaultLocalExpire.expire(DefaultLocalExpire.java:73) at org.projectnessie.gc.tool.cli.commands.BaseRepositoryCommand.expire(BaseRepositoryCommand.java:250) at org.projectnessie.gc.tool.cli.commands.MarkAndSweep.call(MarkAndSweep.java:57) at org.projectnessie.gc.tool.cli.commands.BaseRepositoryCommand.call(BaseRepositoryCommand.java:84) at org.projectnessie.gc.tool.cli.commands.BaseCommand.call(BaseCommand.java:26) at org.projectnessie.gc.tool.cli.commands.BaseCommand.call(BaseCommand.java:21) at picocli.CommandLine.executeUserObject(CommandLine.java:2041) at picocli.CommandLine.access$1500(CommandLine.java:148) at picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2461) at picocli.CommandLine$RunLast.handle(CommandLine.java:2453) at picocli.CommandLine$RunLast.handle(CommandLine.java:2415) at picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2273) at picocli.CommandLine$RunLast.execute(CommandLine.java:2417) at picocli.CommandLine.execute(CommandLine.java:2170) at org.projectnessie.gc.tool.cli.CLI.runMain(CLI.java:96) at org.projectnessie.gc.tool.cli.CLI.runMain(CLI.java:68) at org.projectnessie.gc.tool.cli.CLI.main(CLI.java:63) Caused by: java.lang.NullPointerException: Cannot invoke "Object.getClass()" because "c" is null at org.projectnessie.gc.iceberg.IcebergContentToFiles.extractFiles(IcebergContentToFiles.java:90) at org.projectnessie.gc.expire.PerContentDeleteExpired.lambda$identifyLiveFiles$2(PerContentDeleteExpired.java:125) at java.base/java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:273) at org.projectnessie.gc.contents.jdbc.JdbcHelper$ResultSetSplit.tryAdvance(JdbcHelper.java:143) at java.base/java.util.Spliterator.forEachRemaining(Spliterator.java:332) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:921) at java.base/java.util.stream.ReduceOps$5.evaluateSequential(ReduceOps.java:258) at java.base/java.util.stream.ReduceOps$5.evaluateSequential(ReduceOps.java:248) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.base/java.util.stream.ReferencePipeline.count(ReferencePipeline.java:709) at org.projectnessie.gc.expire.PerContentDeleteExpired.identifyLiveFiles(PerContentDeleteExpired.java:133) at org.projectnessie.gc.expire.PerContentDeleteExpired.expire(PerContentDeleteExpired.java:73) at org.projectnessie.gc.expire.local.DefaultLocalExpire.expireSingleContent(DefaultLocalExpire.java:104) at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:197) at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:992) at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:509) at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:499) at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:960) at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:934) at java.base/java.util.stream.AbstractTask.compute(AbstractTask.java:327) at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:754) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.helpComplete(ForkJoinPool.java:1223) at java.base/java.util.concurrent.ForkJoinPool.helpComplete(ForkJoinPool.java:1915) at java.base/java.util.concurrent.ForkJoinTask.awaitDone(ForkJoinTask.java:433) at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:687) at java.base/java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:927) at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at java.base/java.util.stream.ReferencePipeline.reduce(ReferencePipeline.java:657) at org.projectnessie.gc.expire.local.DefaultLocalExpire.expireInForkJoinPool(DefaultLocalExpire.java:91) at java.base/java.util.concurrent.ForkJoinTask$AdaptedCallable.exec(ForkJoinTask.java:1428) at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182) at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1655) at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1622) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:165)
I am checking table's snapshot history using below command.
select * from TABLE(table_history('xxxxx')) order by made_current_at desc
I am seeing snapshots with made_current_at '2024-03-22 15:12:20.373'. I am using Nessie-gc with P15D as cut off policy. I am expecting only Current and snapshots newer than last 15 days should be visible.
;
Upvotes: 0
Views: 122