Reputation: 4408
Below is my text
Test[LF]
[LF]
Test[LF]
[LF]
Test[LF]
Test[LF]
In notepad++ after enabling show symbol its showing [LF]
symbol as displayed in above.
When endococing above text it showing as below
Test%0D%0A%0D%0ATest%0D%0A%0D%0ATest%0D%0ATest
[LF] encoded as %0D%0A
My question is why is encoded as %0D%0A
? Because [LF]
encode as %OA
where as [CR]
encode as [%OD]
but in above text I am not used [CR]
character.
Upvotes: 0
Views: 306
Reputation: 13690
You can use this Java class to find out each byte of your input file: package example;
import java.io.File;
import java.nio.file.Files;
import java.util.Arrays;
public class FileBytes {
public static void main( String[] args ) throws Exception {
if (args.length != 1) {
throw new IllegalArgumentException( "Please provide one argument" );
}
File f = new File( args[0] );
System.out.println( Arrays.toString( Files.readAllBytes( f.toPath() ) ) );
}
}
You'll see something like this:
[84, 101, 115, 116, 10, 84, 101, 115, 116, 10]
You can see what each value means in an ASCII table if you're lucky and your file is encoded with UTF-8 or ASCII and only contains ASCII characters (if not, then translating bytes to characters will be quite complicated - look up about the particular encoding you're using).
For example, 84 == T
and 10 == LF (Line Feed)
, so you could translate the above to Test(LF)Test(LF)
.
To escape the whole String in the file so it's safe to use in a URL, use URLEncoder
as in this example:
package example;
import java.io.File;
import java.net.URLEncoder;
import java.nio.file.Files;
import java.util.Arrays;
public class FileBytes {
public static void main( String[] args ) throws Exception {
if ( args.length != 1 ) {
throw new IllegalArgumentException( "Please provide one argument" );
}
File f = new File( args[ 0 ] );
byte[] bytes = Files.readAllBytes( f.toPath() );
String rawText = new String( bytes, "UTF-8" );
String encodedText = URLEncoder.encode( rawText, "UTF-8" );
System.out.println( "Raw text: " + rawText );
System.out.println( "Encoded text: " + encodedText );
System.out.println( "Raw bytes: " + Arrays.toString( bytes ) );
System.out.println( "Encoded bytes: " + Arrays.toString( encodedText.getBytes() ) );
System.out.println( Arrays.toString( bytes ) );
}
}
Which prints:
Raw text: Test
Test
Encoded text: Test%0ATest%0A
Raw bytes: [84, 101, 115, 116, 10, 84, 101, 115, 116, 10]
Encoded bytes: [84, 101, 115, 116, 37, 48, 65, 84, 101, 115, 116, 37, 48, 65]
Which clearly shows that the line-feed (10
) is encoded as %0A
(37, 48, 65
).
If you still see %0D (Carriage Return)
in the bytes, your editor is adjusting line-endings automatically to match Windows' convention. There's an option in Notepad++ to select line-endings explicitly.
Upvotes: 1