Downloading a PDF on Dropbox to a phone from a given URI in Base64 gives broken unreadable PDF

Question

I'm getting the URI of PDF's from different sources (local on the phone, Google drive etc) and for Dropbox I can read a byte array using the URI as input. But the PDF that I'm getting is not a valid PDF. Base64 is also not correct.

This is my URI:

content://com.dropbox.android.FileCache/filecache/a54cc030-e2e0-4ef5-8e72-0ac3269a16e1

val inputStream = context.contentResolver.openInputStream(Uri.parse(uri))
val allText = inputStream.bufferedReader().use(BufferedReader::readText)
val base64Image = Base64.encodeToString(allText.toByteArray(), Base64.DEFAULT)

allText content (snippet):

%PDF-1.3
%���������
4 0 obj
<< /Length 5 0 R /Filter /FlateDecode >>
.
.
.
13025
%%EOF

When storing the allText content with .PDF extension doesn't work.

The format looks good, but when inserting base64Image in https://base64.guru/converter/decode/pdf it shows that it's not correct.

Original PDF content (snippet):

2550 4446 2d31 2e33 0a25 c4e5 f2e5 eba7
f3a0 d0c4 c60a 3420 3020 6f62 6a0a 3c3c
.
.
.
.
0a73 7461 7274 7872 6566 0a31 3330 3235
0a25 2545 4f46 0a

VC.One · Accepted Answer

"I can read a byte array using the URI as input. But the PDF that I'm getting is not a valid PDF."

"When storing the allText content with .PDF extension doesn't work."

You're reading the PDF input bytes (hex) and storing them into a wrong format (text).
For example, all valid PDF files are expected to begin with bytes 25 50 44 46. Your allText content snippet starts with %PDF which is the converted ASCII/UTF text representation of those bytes.

Problem:
All this is fine because we can just convert the text characters back into their respective byte values, right? Nope, not all byte values can be correctly recovered back from text format.

example #1: can convert...

input bytes : 25 50 44 46
as text     : %  P  D  F
into bytes  : 25 50 44 46

example #2: cannot convert (original data is not recovered, because no text chars for such bytes)...

input bytes : 25 C4 E5 F2 E5 EB A7 F3 A0 D0
as text     : %  � � � �  � � � � � 
into bytes  : 25 00 00 00 00 00 00 00 00 00

Solution:

Try something like below. You want the logic as explained within the code comments...

import java.io.File
import java.io.InputStream

fun main(args: Array) 
{
    //# setup access to your file...
    var inFile :InputStream = File("your-file-path-here.pdf")
    var fileSize :Int = File(path).length()

    //# read file bytes into a bytes Array...
    var inStream :InputStream = inFile.inputStream()
    var inBytes :ByteArray = inStream.readBytes()

    //# Make as String (of hex values)...
    //var hexString :String = ""
    val hexString = ""
    for (b in inBytes) { hexString = String.format("%02X", b) }

    //# check values as hex... should print: 25 
    //print(hexString) //could be long print-out for a big file

    //# Make Base64 string...
    val base64 = Base64.getEncoder().encodeToString(inBytes)
}

"Base64 is also not correct."

(option 1)

Try converting to Base64 the hexString in above example code (note: now added as val base64).

(option 2)

Directly read file bytes into a Base64 string with simple...

val bytes = File(filePath).readBytes()
val base64 = Base64.getEncoder().encodeToString(bytes)

Downloading a PDF on Dropbox to a phone from a given URI in Base64 gives broken unreadable PDF

Answers (2)

Related Questions