Reputation:
Want to truncate error string so it for sure fits into Oracle table column VARCHAR2(2000 BYTE)
Design forces:
The main goal is to fit to the table column.
90-95% of string text is exception message and stacktraces. But it could contain some customer name with french, turkish characters which I am willing to disregard and see as ? or whatever.
I want code to be dead simple. database encoding can change. Chinese characters can be introduced but I want code to work anyway.
Should be "dead simple" but it got me pondering for a while.
What are suggestions?
Probably the best options is to convert to ascii. But I came up variant which is not nice but probably works.
public static String trimStringToBytes(StringBuilder builder, int maximumBytes)
{
String truncatedString = builder.length() > maximumBytes ? builder.substring(0, maximumBytes) : builder.toString();
byte[] bytes;
String asciiCharsetName = "US-ASCII";
try
{
bytes = truncatedString.getBytes(asciiCharsetName);
}
catch (UnsupportedEncodingException e)
{
//not really possible as JVM should support always US-ASCII but anyway
int worstCaseScenarioBytesPerCharacter = 4;
bytes = truncatedString.substring(0, truncatedString.length() / worstCaseScenarioBytesPerCharacter).getBytes();
}
return new String(bytes, 0, bytes.length > maximumBytes ? maximumBytes : bytes.length);
}
Upvotes: 3
Views: 3673
Reputation: 403461
I would recommend not doing this in Java, but instead in SQL when you perform the INSERT.
For example, in Oracle you could use the SUBSTR function to trim, using connection.prepareStatement
:
insert into mytable (col1, col2) values (?, substr(?, 0, 2000));
Then set your col1
and col2
values on the PreparedStatement
, and Oracle should get the first 2000 characters/bytes/whatever-it-is-that-oracle-does of the value, and set that.
It might even be worth doing this with a stored procedure, passing in the entire String as a VARCHAR2 argument to the procedure, which then trims it and inserts the row. No need for the application to get involved with the underlying storage semantics.
Upvotes: 2
Reputation:
Is it possible to change the column to VARCHAR2(2000 CHAR)? That would eliminate the encoding issue entirely.
Upvotes: 0
Reputation: 346260
I think your method should work, but intentionally losing all non-ASCII characters is pretty nasty. If you ever have messages in Chinese, they will be replaced completely with ???
IMO the best thing would be to use SQL functions in the insert query to do the trimming. That makes sure that you never exceed the column size AND lose as little data as possible. It's also much less error-prone as trying to do encoding-aware trimming in the Java code.
Upvotes: 0