Reputation: 11
I am using Tesseract for Java for OCR text extraction, which makes JNI native calls to Leptonica system library.
When running a Quarkus application in JVM mode in a Linux-based host it resolves JNI method calls correctly,
and all works as expected. But when running as a native image, an UnsatisfiedLinkError
exception is thrown when making a call to pixReadMem
metohd, which is not found.
The stack trace when running the native image is:
2022-12-16 08:20:29,736 ERROR [org.jbo.res.rea.com.cor.AbstractResteasyReactiveContext] (executor-thread-0) Request failed: java.lang.UnsatisfiedLinkError: Error looking up function 'pixReadMem': com.sun.jna.Native.findSymbol(JLjava/lang/String;)J [symbol: Java_com_sun_jna_Native_findSymbol or Java_com_sun_jna_Native_findSymbol__JLjava_lang_String_2]
at com.sun.jna.Function.<init>(Function.java:252)
at com.sun.jna.NativeLibrary.getFunction(NativeLibrary.java:600)
at com.sun.jna.NativeLibrary.getFunction(NativeLibrary.java:576)
at com.sun.jna.NativeLibrary.getFunction(NativeLibrary.java:562)
at com.sun.jna.Library$Handler.invoke(Library.java:243)
at com.sun.proxy.$Proxy107.pixReadMem(Unknown Source)
at com.itextpdf.pdfocr.tesseract4.TesseractOcrUtil.readPix(TesseractOcrUtil.java:662)
at com.itextpdf.pdfocr.tesseract4.TesseractOcrUtil.readPix(TesseractOcrUtil.java:641)
at com.itextpdf.pdfocr.tesseract4.ImagePreprocessingUtil.preprocessImage(ImagePreprocessingUtil.java:166)
at com.itextpdf.pdfocr.tesseract4.Tesseract4LibOcrEngine.getOcrResultForSinglePage(Tesseract4LibOcrEngine.java:314)
at com.itextpdf.pdfocr.tesseract4.Tesseract4LibOcrEngine.doTesseractOcr(Tesseract4LibOcrEngine.java:189)
at com.itextpdf.pdfocr.tesseract4.AbstractTesseract4OcrEngine.processInputFiles(AbstractTesseract4OcrEngine.java:494)
at com.itextpdf.pdfocr.tesseract4.AbstractTesseract4OcrEngine.doImageOcr(AbstractTesseract4OcrEngine.java:232)
at es.gen.quarkus.ocr.service.TesseractService.extratTextFromJPG(TesseractService.java:42)
at es.gen.quarkus.ocr.service.TesseractService_ClientProxy.extratTextFromJPG(Unknown Source)
at es.gen.quarkus.ocr.service.OCRService.extractTextFromPDF(OCRService.java:34)
at es.gen.quarkus.ocr.service.OCRService_ClientProxy.extractTextFromPDF(Unknown Source)
at es.gen.quarkus.ocr.resource.OCRResource.extractTextFromPDF(OCRResource.java:46)
at es.gen.quarkus.ocr.resource.OCRResource$quarkusrestinvoker$extractTextFromPDF_21209ffb320090ffe15503899b13ececc74cb601.invoke(Unknown Source)
at org.jboss.resteasy.reactive.server.handlers.InvocationHandler.handle(InvocationHandler.java:29)
at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:114)
at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2449)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1478)
at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at [email protected]/java.lang.Thread.run(Thread.java:829)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:775)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixPlatformThreads.pthreadStartRoutine(PosixPlatformThreads.java:203)`
```
I have tried to set Java system properties with no success:
- `java.library.path` (the current value includes the linux system folders).
- `jna.debug_load=true` does not log any information.
Upvotes: 1
Views: 702
Reputation: 4602
Add this to properties
quarkus.native.additional-build-args=--initialize-at-build-time=java.awt.image.DirectColorModel --trace-class-initialization=sun.awt.image.IntegerInterleavedRaster\\,java.awt.image.DataBufferByte\\,java.awt.Rectangle\\,java.awt.image.SinglePixelPackedSampleModel\\,sun.java2d.StateTrackableDelegate$2\\,java.awt.image.PackedColorModel\\,java.awt.image.DirectColorModel\\,java.awt.image.BufferedImage\\,java.awt.image.DataBuffer\\,java.awt.Toolkit\\,java.awt.color.ColorSpace$BuiltInSpace\\,sun.awt.image.IntegerComponentRaster\\,sun.awt.image.SunWritableRaster\\,java.awt.image.ComponentSampleModel\\,java.awt.image.ColorModel\\,java.awt.image.BandedSampleModel\\,java.awt.image.WritableRaster\\,java.awt.Image\\,java.awt.image.SampleModel\\,java.awt.image.Raster\\,java.awt.image.DataBufferInt\\,sun.java2d.StateTrackableDelegate\\,sun.awt.image.ByteBandedRaster
install libtesseract5 because latest code uses that. this is how I installed it
sudo add-apt-repository ppa:alex-p/tesseract-ocr5
sudo apt update
sudo apt install libtesseract5
then I manage to get the runner executable. no need to run install just mvn package -Pnative will do fine. When I ran in my local and send a pdf, got another exception.
/quarkus-ocr$ ./target/quarkus-ocr-1.0.0-SNAPSHOT-runner
__ ____ __ _____ ___ __ ____ ______
--/ __ \/ / / / _ | / _ \/ //_/ / / / __/
-/ /_/ / /_/ / __ |/ , _/ ,< / /_/ /\ \
--\___\_\____/_/ |_/_/|_/_/|_|\____/___/
2022-12-20 19:09:30,419 INFO [com.zhe.qua.ocr.res.OCRResource] (executor-thread-0) Extracting text from PDF: Profile.pdf (58739 bytes)
java.io.IOException: Error: Could not find referenced cmap stream Identity-H
at org.apache.fontbox.cmap.CMapParser.getExternalCMap(CMapParser.java:492)
at org.apache.fontbox.cmap.CMapParser.parsePredefined(CMapParser.java:99)
at org.apache.pdfbox.pdmodel.font.CMapManager.getPredefinedCMap(CMapManager.java:55)
at org.apache.pdfbox.pdmodel.font.PDType0Font.readEncoding(PDType0Font.java:287)
at org.apache.pdfbox.pdmodel.font.PDType0Font.<init>(PDType0Font.java:204)
at org.apache.pdfbox.pdmodel.font.PDFontFactory.createFont(PDFontFactory.java:97)
at org.apache.pdfbox.pdmodel.PDResources.getFont(PDResources.java:146)
at org.apache.pdfbox.contentstream.operator.text.SetFontAndSize.process(SetFontAndSize.java:66)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processOperator(PDFStreamEngine.java:966)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStreamOperators(PDFStreamEngine.java:541)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processStream(PDFStreamEngine.java:516)
at org.apache.pdfbox.contentstream.PDFStreamEngine.processPage(PDFStreamEngine.java:155)
at org.apache.pdfbox.text.LegacyPDFStreamEngine.processPage(LegacyPDFStreamEngine.java:155)
at org.apache.pdfbox.text.PDFTextStripper.processPage(PDFTextStripper.java:363)
at org.apache.pdfbox.text.PDFTextStripper.processPages(PDFTextStripper.java:291)
at org.apache.pdfbox.text.PDFTextStripper.writeText(PDFTextStripper.java:238)
at org.apache.pdfbox.text.PDFTextStripper.getText(PDFTextStripper.java:202)
at com.zheta.quarkus.ocr.service.OCRService.extractTextFromPDF(OCRService.java:30)
at com.zheta.quarkus.ocr.service.OCRService_ClientProxy.extractTextFromPDF(Unknown Source)
at com.zheta.quarkus.ocr.resource.OCRResource.extractTextFromPDF(OCRResource.java:43)
at com.zheta.quarkus.ocr.resource.OCRResource$quarkusrestinvoker$extractTextFromPDF_924fb8ae275125151f52704028fdf25205ebf12f.invoke(Unknown Source)
at org.jboss.resteasy.reactive.server.handlers.InvocationHandler.handle(InvocationHandler.java:29)
at io.quarkus.resteasy.reactive.server.runtime.QuarkusResteasyReactiveRequestContext.invokeHandler(QuarkusResteasyReactiveRequestContext.java:114)
at org.jboss.resteasy.reactive.common.core.AbstractResteasyReactiveContext.run(AbstractResteasyReactiveContext.java:145)
at io.quarkus.vertx.core.runtime.VertxCoreRecorder$14.runWith(VertxCoreRecorder.java:576)
at org.jboss.threads.EnhancedQueueExecutor$Task.run(EnhancedQueueExecutor.java:2449)
at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1478)
at org.jboss.threads.DelegatingRunnable.run(DelegatingRunnable.java:29)
at org.jboss.threads.ThreadLocalResettingRunnable.run(ThreadLocalResettingRunnable.java:29)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at [email protected]/java.lang.Thread.run(Thread.java:833)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:775)
at org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixPlatformThreads.pthreadStartRoutine(PosixPlatformThreads.java:203)
2022-12-20 19:09:30,429 INFO [com.zhe.qua.ocr.res.OCRResource] (executor-thread-0) Extracted text from PDF: Profile.pdf (58739 bytes) finished in 10 ms
build log to show versions
mintozzy@laptop:~/tmp/quarkus-ocrmvn clean package -Pnativer
[INFO] Scanning for projects...
[INFO]
[INFO] -------------------< com.zheta.quarkus:quarkus-ocr >--------------------
[INFO] Building quarkus-ocr 1.0.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO]
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ quarkus-ocr ---
[INFO] Deleting /home/mintozzy/tmp/quarkus-ocr/target
[INFO]
[INFO] --- maven-resources-plugin:2.6:resources (default-resources) @ quarkus-ocr ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 3 resources
[INFO]
[INFO] --- quarkus-maven-plugin:2.14.3.Final:generate-code (default) @ quarkus-ocr ---
[INFO]
[INFO] --- maven-compiler-plugin:3.8.1:compile (default-compile) @ quarkus-ocr ---
[INFO] Changes detected - recompiling the module!
[INFO] Compiling 7 source files to /home/mintozzy/tmp/quarkus-ocr/target/classes
[INFO]
[INFO] --- quarkus-maven-plugin:2.14.3.Final:generate-code-tests (default) @ quarkus-ocr ---
[INFO]
[INFO] --- maven-resources-plugin:2.6:testResources (default-testResources) @ quarkus-ocr ---
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory /home/mintozzy/tmp/quarkus-ocr/src/test/resources
[INFO]
[INFO] --- maven-compiler-plugin:3.8.1:testCompile (default-testCompile) @ quarkus-ocr ---
[INFO] No sources to compile
[INFO]
[INFO] --- maven-surefire-plugin:3.0.0-M7:test (default-test) @ quarkus-ocr ---
[INFO] No tests to run.
[INFO]
[INFO] --- maven-jar-plugin:2.4:jar (default-jar) @ quarkus-ocr ---
[INFO] Building jar: /home/mintozzy/tmp/quarkus-ocr/target/quarkus-ocr-1.0.0-SNAPSHOT.jar
[INFO]
[INFO] --- quarkus-maven-plugin:2.14.3.Final:build (default) @ quarkus-ocr ---
[INFO] [io.quarkus.deployment.pkg.steps.JarResultBuildStep] Building native image source jar: /home/mintozzy/tmp/quarkus-ocr/target/quarkus-ocr-1.0.0-SNAPSHOT-native-image-source-jar/quarkus-ocr-1.0.0-SNAPSHOT-runner.jar
[INFO] [io.quarkus.deployment.pkg.steps.NativeImageBuildStep] Building native image from /home/mintozzy/tmp/quarkus-ocr/target/quarkus-ocr-1.0.0-SNAPSHOT-native-image-source-jar/quarkus-ocr-1.0.0-SNAPSHOT-runner.jar
[INFO] [io.quarkus.deployment.pkg.steps.NativeImageBuildStep] Running Quarkus native-image plugin on GraalVM 22.3.0 Java 17 EE (Java Version 17.0.5+9-LTS-jvmci-22.3-b07)
[INFO] [io.quarkus.deployment.pkg.steps.NativeImageBuildRunner] /home/mintozzy/.sdkman/candidates/java/graalvm-ee-java17-22.3.0/bin/native-image -J-Djava.util.logging.manager=org.jboss.logmanager.LogManager -J-Dlogging.initial-configurator.min-level=500 -J-Dsun.nio.ch.maxUpdateArraySize=100 -J-Dvertx.logger-delegate-factory-class-name=io.quarkus.vertx.core.runtime.VertxLogDelegateFactory -J-Dvertx.disableDnsResolver=true -J-Dio.netty.leakDetection.level=DISABLED -J-Dio.netty.allocator.maxOrder=3 -J-Duser.language=en -J-Duser.country=GB -J-Dfile.encoding=UTF-8 --features=io.quarkus.runner.Feature,io.quarkus.runtime.graal.ResourcesFeature,io.quarkus.runtime.graal.DisableLoggingFeature,io.quarkus.awt.runtime.graal.AwtFeature,io.quarkus.awt.runtime.graal.DarwinAwtFeature -J--add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED -J--add-opens=java.base/java.text=ALL-UNNAMED -J--add-opens=java.base/java.io=ALL-UNNAMED -J--add-opens=java.base/java.lang.invoke=ALL-UNNAMED -J--add-opens=java.base/java.util=ALL-UNNAMED -H:+CollectImageBuildStatistics -H:ImageBuildStatisticsFile=quarkus-ocr-1.0.0-SNAPSHOT-runner-timing-stats.json -H:BuildOutputJSONFile=quarkus-ocr-1.0.0-SNAPSHOT-runner-build-output-stats.json --initialize-at-build-time=java.awt.image.DirectColorModel --trace-class-initialization=sun.awt.image.IntegerInterleavedRaster,java.awt.image.DataBufferByte,java.awt.Rectangle,java.awt.image.SinglePixelPackedSampleModel,sun.java2d.StateTrackableDelegate\$2,java.awt.image.PackedColorModel,java.awt.image.DirectColorModel,java.awt.image.BufferedImage,java.awt.image.DataBuffer,java.awt.Toolkit,java.awt.color.ColorSpace\$BuiltInSpace,sun.awt.image.IntegerComponentRaster,sun.awt.image.SunWritableRaster,java.awt.image.ComponentSampleModel,java.awt.image.ColorModel,java.awt.image.BandedSampleModel,java.awt.image.WritableRaster,java.awt.Image,java.awt.image.SampleModel,java.awt.image.Raster,java.awt.image.DataBufferInt,sun.java2d.StateTrackableDelegate,sun.awt.image.ByteBandedRaster -H:+AllowFoldMethods -J-Djava.awt.headless=true --no-fallback --link-at-build-time -H:+ReportExceptionStackTraces -H:-AddAllCharsets --enable-url-protocols=http -H:NativeLinkerOption=-no-pie -H:-UseServiceLoaderFeature -H:+StackTrace -J--add-exports=org.graalvm.sdk/org.graalvm.nativeimage.impl=ALL-UNNAMED -J--add-exports=org.graalvm.nativeimage.builder/com.oracle.svm.core.jdk=ALL-UNNAMED quarkus-ocr-1.0.0-SNAPSHOT-runner -jar quarkus-ocr-1.0.0-SNAPSHOT-runner.jar
========================================================================================================================
GraalVM Native Image: Generating 'quarkus-ocr-1.0.0-SNAPSHOT-runner' (executable)...
========================================================================================================================
[1/7] Initializing... (11.3s @ 0.19GB)
Version info: 'GraalVM 22.3.0 Java 17 EE'
Java version info: '17.0.5+9-LTS-jvmci-22.3-b07'
C compiler: gcc (linux, x86_64, 9.4.0)
Garbage collector: Serial GC
5 user-specific feature(s)
- io.quarkus.awt.runtime.graal.AwtFeature
- io.quarkus.awt.runtime.graal.DarwinAwtFeature
- io.quarkus.runner.Feature: Auto-generated class by Quarkus from the existing extensions
- io.quarkus.runtime.graal.DisableLoggingFeature: Disables INFO logging during the analysis phase for the [org.jboss.threads] categories
- io.quarkus.runtime.graal.ResourcesFeature: Register each line in META-INF/quarkus-native-resources.txt as a resource on Substrate VM
# Printing class initialization configuration to: /home/mintozzy/tmp/quarkus-ocr/target/quarkus-ocr-1.0.0-SNAPSHOT-native-image-source-jar/reports/class_initialization_configuration_20221220_193027.csv
# Printing class initialization configuration to: /home/mintozzy/tmp/quarkus-ocr/target/quarkus-ocr-1.0.0-SNAPSHOT-native-image-source-jar/reports/class_initialization_configuration_20221220_193129.csv
# Printing class initialization dependencies to: /home/mintozzy/tmp/quarkus-ocr/target/quarkus-ocr-1.0.0-SNAPSHOT-native-image-source-jar/reports/class_initialization_dependencies_20221220_193205.dot
# Printing class initialization report to: /home/mintozzy/tmp/quarkus-ocr/target/quarkus-ocr-1.0.0-SNAPSHOT-native-image-source-jar/reports/class_initialization_report_20221220_193205.csv
[2/7] Performing analysis... [*********] (98.7s @ 2.15GB)
15,243 (88.71%) of 17,182 classes reachable
25,842 (62.65%) of 41,250 fields reachable
88,340 (59.44%) of 148,629 methods reachable
516 classes, 118 fields, and 2,372 methods registered for reflection
179 classes, 1,537 fields, and 2,092 methods registered for JNI access
7 native libraries: dl, freetype, m, pthread, rt, stdc++, z
[3/7] Building universe... (9.6s @ 2.71GB)
[4/7] Parsing methods... [***] (11.5s @ 2.63GB)
[5/7] Inlining methods... [***] (4.4s @ 1.94GB)
[6/7] Compiling methods... [**************] (214.3s @ 1.47GB)
[7/7] Creating image... (13.4s @ 2.06GB)
46.92MB (44.56%) for code area: 51,972 compilation units
55.45MB (52.66%) for image heap:1,114,462 objects and 302 resources
2.93MB ( 2.78%) for other data
105.29MB in total
------------------------------------------------------------------------------------------------------------------------
Top 10 packages in code area: Top 10 object types in image heap:
3.17MB com.oracle.svm.core.code 12.28MB o.a.c.imaging.common.itu_t4.HuffmanTree$Node
1.66MB sun.security.ssl 9.53MB byte[] for code metadata
1.37MB java.util 4.56MB byte[] for general heap data
1.12MB com.oracle.svm.core.jni 4.53MB byte[] for embedded resources
986.77KB sun.font 3.77MB byte[] for java.lang.String
971.69KB io.netty.buffer 3.75MB java.lang.String
914.40KB com.sun.crypto.provider 2.74MB java.lang.Class
881.87KB java.lang.invoke 2.48MB java.lang.Object[]
718.86KB java.lang 1.60MB int[]
641.56KB io.vertx.core.http.impl 1.58MB java.util.HashMap$Node
34.23MB for 623 more packages 7.67MB for 3524 more object types
------------------------------------------------------------------------------------------------------------------------
16.3s (4.3% of total time) in 146 GCs | Peak RSS: 4.75GB | CPU load: 6.64
------------------------------------------------------------------------------------------------------------------------
Produced artifacts:
/home/mintozzy/tmp/quarkus-ocr/target/quarkus-ocr-1.0.0-SNAPSHOT-native-image-source-jar/quarkus-ocr-1.0.0-SNAPSHOT-runner (executable)
/home/mintozzy/tmp/quarkus-ocr/target/quarkus-ocr-1.0.0-SNAPSHOT-native-image-source-jar/quarkus-ocr-1.0.0-SNAPSHOT-runner-build-output-stats.json (json)
/home/mintozzy/tmp/quarkus-ocr/target/quarkus-ocr-1.0.0-SNAPSHOT-native-image-source-jar/quarkus-ocr-1.0.0-SNAPSHOT-runner-timing-stats.json (raw)
/home/mintozzy/tmp/quarkus-ocr/target/quarkus-ocr-1.0.0-SNAPSHOT-native-image-source-jar/quarkus-ocr-1.0.0-SNAPSHOT-runner.build_artifacts.txt (txt)
========================================================================================================================
Finished generating 'quarkus-ocr-1.0.0-SNAPSHOT-runner' in 6m 14s.
[INFO] [io.quarkus.deployment.pkg.steps.NativeImageBuildRunner] objcopy --strip-debug quarkus-ocr-1.0.0-SNAPSHOT-runner
[INFO] [io.quarkus.deployment.QuarkusAugmentor] Quarkus augmentation completed in 381602ms
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 06:25 min
[INFO] Finished at: 2022-12-20T19:36:34Z
[INFO] ------------------------------------------------------------------------
what libraries it depends
mintozzy@laptop:~/tmp/quarkus-ocr$ ldd ./target/quarkus-ocr-1.0.0-SNAPSHOT-runner
linux-vdso.so.1 (0x00007fff0f5ff000)
libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007fdbd2522000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007fdbd23d3000)
libfreetype.so.6 => /lib/x86_64-linux-gnu/libfreetype.so.6 (0x00007fdbd2314000)
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007fdbd22f8000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007fdbd22d5000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007fdbd22cf000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007fdbd22b2000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fdbd20c0000)
/lib64/ld-linux-x86-64.so.2 (0x00007fdbd2720000)
libpng16.so.16 => /lib/x86_64-linux-gnu/libpng16.so.16 (0x00007fdbd2088000)
Upvotes: 0