Sarv Shakti Singh
Sarv Shakti Singh

Reputation: 21

Need a head start in reading AFP files and extracting the content and metadata

I was assigned to work on this specific project, where we will be getting AFP(advanced function presentation) files and we need to get the documents, i.e.the content and the corresponding meta data. I have been looking into AFP(advanced function presentation) file format and haven't actually got any useful resource about how I should proceed with the task.

I have almost got no information up until now and don't know where to proceed. I looked into some open source projects and found this: https://github.com/yan74/afplib

I tried running it.. But it does not work on the sample AFP file which I have.

Really need some insight upon what resources should I go through to be able this project.

I need to write the code in Java and have gone through some licensed softwares which do the same,like PROARCHIVER and PAPYRUS.

Thanks in advance

Upvotes: 2

Views: 2699

Answers (1)

owairc
owairc

Reputation: 1978

AFP is an easy format, it's composed of structured fields, your first step is decoding them, download this: "Mixed Object Document Content Architecture Reference" read first 50 pages and write code to split afp into structured fields, in order to create an easy dump of your file.

After that if you want to extract images AFP world calls them IOCA, so you need: Image Object Content Architecture reference

If you want to extract text (called PTX) you need: Presentation Text Object Content Architecture Reference

good job

Upvotes: 2

Related Questions