Reputation: 6474
While issuing a GET
request using Apache HTTP Client v4, how do I obtain the response media type (formally MIME type)?
Using Apache HTTP Client v3, the MIME type was obtained with:
String mimeType = response.getMimeType();
How do I get the media type using Apache HTTP Client v4?
Upvotes: 12
Views: 35734
Reputation: 31161
Note that Apache's ContentType treats SVG as application/svg+xml
, rather than the IANA-defined image/svg+xml
, which appears to be an incorrect categorization.
Although this answer doesn't answer the question, directly, it provides an alternative to using Apache's HTTP Client by using Java's HTTP Client. Additionally, the example code:
HEAD
request, rather than a GET
request, which is a lighter operation;MediaType
enum instead of a string; andContentType
.Without further ado, here are a few Java source files that may prove helpful.
The base enumeration encodes IANA media types. If you add more official encodings, please update this answer so that all may benefit. Note that R Markdown, R XML, and YAML are not defined, officially, so you may wish to remove them.
import static org.apache.commons.io.FilenameUtils.getExtension;
public enum MediaType {
APP_JAVA_OBJECT(
APPLICATION, "x-java-serialized-object"
),
FONT_OTF( "otf" ),
FONT_TTF( "ttf" ),
IMAGE_APNG( "apng" ),
IMAGE_ACES( "aces" ),
IMAGE_AVCI( "avci" ),
IMAGE_AVCS( "avcs" ),
IMAGE_BMP( "bmp" ),
IMAGE_CGM( "cgm" ),
IMAGE_DICOM_RLE( "dicom_rle" ),
IMAGE_EMF( "emf" ),
IMAGE_EXAMPLE( "example" ),
IMAGE_FITS( "fits" ),
IMAGE_G3FAX( "g3fax" ),
IMAGE_GIF( "gif" ),
IMAGE_HEIC( "heic" ),
IMAGE_HEIF( "heif" ),
IMAGE_HEJ2K( "hej2k" ),
IMAGE_HSJ2( "hsj2" ),
IMAGE_X_ICON( "x-icon" ),
IMAGE_JLS( "jls" ),
IMAGE_JP2( "jp2" ),
IMAGE_JPEG( "jpeg" ),
IMAGE_JPH( "jph" ),
IMAGE_JPHC( "jphc" ),
IMAGE_JPM( "jpm" ),
IMAGE_JPX( "jpx" ),
IMAGE_JXR( "jxr" ),
IMAGE_JXRA( "jxrA" ),
IMAGE_JXRS( "jxrS" ),
IMAGE_JXS( "jxs" ),
IMAGE_JXSC( "jxsc" ),
IMAGE_JXSI( "jxsi" ),
IMAGE_JXSS( "jxss" ),
IMAGE_KTX( "ktx" ),
IMAGE_KTX2( "ktx2" ),
IMAGE_NAPLPS( "naplps" ),
IMAGE_PNG( "png" ),
IMAGE_SVG_XML( "svg+xml" ),
IMAGE_T38( "t38" ),
IMAGE_TIFF( "tiff" ),
IMAGE_WEBP( "webp" ),
IMAGE_WMF( "wmf" ),
TEXT_HTML( TEXT, "html" ),
TEXT_MARKDOWN( TEXT, "markdown" ),
TEXT_PLAIN( TEXT, "plain" ),
TEXT_R_MARKDOWN( TEXT, "R+markdown" ),
TEXT_R_XML( TEXT, "R+xml" ),
TEXT_YAML( TEXT, "yaml" ),
UNDEFINED( TypeName.UNDEFINED, "undefined" );
/**
* The IANA-defined types.
*/
public enum TypeName {
APPLICATION,
IMAGE,
TEXT,
UNDEFINED
}
/**
* The fully qualified IANA-defined media type.
*/
private final String mMediaType;
/**
* The IANA-defined type name.
*/
private final TypeName mTypeName;
/**
* The IANA-defined subtype name.
*/
private final String mSubtype;
/**
* Constructs an instance using the default type name of "image".
*
* @param subtype The image subtype name.
*/
MediaType( final String subtype ) {
this( IMAGE, subtype );
}
/**
* Constructs an instance using an IANA-defined type and subtype pair.
*
* @param typeName The media type's type name.
* @param subtype The media type's subtype name.
*/
MediaType( final TypeName typeName, final String subtype ) {
mTypeName = typeName;
mSubtype = subtype;
mMediaType = typeName.toString().toLowerCase() + '/' + subtype;
}
/**
* Returns the {@link MediaType} associated with the given file.
*
* @param file Has a file name that may contain an extension associated with
* a known {@link MediaType}.
* @return {@link MediaType#UNDEFINED} if the extension has not been
* assigned, otherwise the {@link MediaType} associated with this
* {@link File}'s file name extension.
*/
public static MediaType valueFrom( final File file ) {
return valueFrom( file.getName() );
}
/**
* Returns the {@link MediaType} associated with the given file name.
*
* @param filename The file name that may contain an extension associated
* with a known {@link MediaType}.
* @return {@link MediaType#UNDEFINED} if the extension has not been
* assigned, otherwise the {@link MediaType} associated with this
* URL's file name extension.
*/
public static MediaType valueFrom( final String filename ) {
return getMediaType( getExtension( filename ) );
}
/**
* Returns the {@link MediaType} for the given type and subtype names.
*
* @param type The IANA-defined type name.
* @param subtype The IANA-defined subtype name.
* @return {@link MediaType#UNDEFINED} if there is no {@link MediaType} that
* matches the given type and subtype names.
*/
public static MediaType valueFrom(
final String type, final String subtype ) {
for( final var mediaType : MediaType.values() ) {
if( mediaType.equals( type, subtype ) ) {
return mediaType;
}
}
return UNDEFINED;
}
/**
* Answers whether the given type and subtype names equal this enumerated
* value. This performs a case-insensitive comparison.
*
* @param type The type name to compare against this {@link MediaType}.
* @param subtype The subtype name to compare against this {@link MediaType}.
* @return {@code true} when the type and subtype name match.
*/
public boolean equals( final String type, final String subtype ) {
return mTypeName.name().equalsIgnoreCase( type ) &&
mSubtype.equalsIgnoreCase( subtype );
}
/**
* Answers whether the given {@link TypeName} matches this type name.
*
* @param typeName The {@link TypeName} to compare against the internal value.
* @return {@code true} if the given value is the same IANA-defined type name.
*/
public boolean isType( final TypeName typeName ) {
return mTypeName == typeName;
}
/**
* Returns the IANA-defined type and sub-type.
*
* @return The unique media type identifier.
*/
public String toString() {
return mMediaType;
}
/**
* Used by {@link MediaTypeExtensions} to initialize associations where the
* subtype name and the file name extension have a 1:1 mapping.
*
* @return The IANA subtype value.
*/
String getSubtype() {
return mSubtype;
}
}
Different file name extensions map to various media types. The mapping of extensions to MediaType
does not necessarily mean that the content matches the expected media type. Applications must take care to read the file headers to determine the actual media type.
enum MediaTypeExtensions {
MEDIA_FONT_OTF( FONT_OTF ),
MEDIA_FONT_TTF( FONT_TTF ),
MEDIA_IMAGE_APNG( IMAGE_APNG ),
MEDIA_IMAGE_BMP( IMAGE_BMP ),
MEDIA_IMAGE_GIF( IMAGE_GIF ),
MEDIA_IMAGE_ICO( IMAGE_X_ICON, of( "ico", "cur" ) ),
MEDIA_IMAGE_JPEG( IMAGE_JPEG, of( "jpg", "jpeg", "jfif", "pjpeg", "pjp" ) ),
MEDIA_IMAGE_PNG( IMAGE_PNG ),
MEDIA_IMAGE_SVG( IMAGE_SVG_XML, of( "svg" ) ),
MEDIA_IMAGE_TIFF( IMAGE_TIFF, of( "tif", "tiff" ) ),
MEDIA_IMAGE_WEBP( IMAGE_WEBP ),
MEDIA_TEXT_MARKDOWN( TEXT_MARKDOWN, of(
"md", "markdown", "mdown", "mdtxt", "mdtext", "mdwn", "mkd", "mkdown",
"mkdn" ) ),
MEDIA_TEXT_PLAIN( TEXT_PLAIN, of( "asc", "ascii", "txt", "text", "utxt" ) ),
MEDIA_TEXT_R_MARKDOWN( TEXT_R_MARKDOWN, of( "Rmd" ) ),
MEDIA_TEXT_R_XML( TEXT_R_XML, of( "Rxml" ) ),
MEDIA_TEXT_YAML( TEXT_YAML, of( "yaml", "yml" ) );
private final MediaType mMediaType;
private final Set<String> mExtensions;
MediaTypeExtensions( final MediaType mediaType ) {
this( mediaType, of( mediaType.getSubtype() ) );
}
MediaTypeExtensions(
final MediaType mediaType, final Set<String> extensions ) {
assert mediaType != null;
assert extensions != null;
assert !extensions.isEmpty();
mMediaType = mediaType;
mExtensions = extensions;
}
static MediaType getMediaType( final String extension ) {
final var sanitized = sanitize( extension );
for( final var mediaType : MediaTypeExtensions.values() ) {
if( mediaType.isType( sanitized ) ) {
return mediaType.getMediaType();
}
}
return UNDEFINED;
}
private boolean isType( final String sanitized ) {
for( final var extension : mExtensions ) {
if( extension.equalsIgnoreCase( sanitized ) ) {
return true;
}
}
return false;
}
private static String sanitize( final String extension ) {
return extension == null ? "" : extension.toLowerCase();
}
private MediaType getMediaType() {
return mMediaType;
}
}
Finally, we can write a tiny parser that converts the content-type header into a MediaType
value. Note that the HttpClient
API itself performs a case-sensitive comparison against the header name, so we cannot use methods like firstValue
or allValues
because we don't know whether the server will return "Content-Type" or "content-type". Strictly speaking, this appears to be a bug because RFC-2616 states that message headers are not case-sensitive.
public class HttpMediaType {
private final static HttpClient HTTP_CLIENT = HttpClient
.newBuilder()
.connectTimeout( ofSeconds( 5 ) )
.followRedirects( NORMAL )
.build();
/**
* Performs an HTTP HEAD request to determine the media type based on the
* Content-Type header returned from the server.
*
* @param uri Determine the media type for this resource.
* @return The data type for the resource or {@link MediaType#UNDEFINED} if
* unmapped.
* @throws MalformedURLException The {@link URI} could not be converted to
* a {@link URL}.
*/
public static MediaType valueFrom( final URI uri )
throws MalformedURLException {
final var mediaType = new MediaType[]{UNDEFINED};
try {
final var request = HttpRequest
.newBuilder( uri )
.method( "HEAD", noBody() )
.build();
final var response = HTTP_CLIENT.send( request, discarding() );
final var headers = response.headers();
final var map = headers.map();
map.forEach( ( key, values ) -> {
if( "Content-Type".equalsIgnoreCase( key ) ) {
var header = values.get( 0 );
// Trim off the character encoding.
var i = header.indexOf( ';' );
header = header.substring( 0, i == -1 ? header.length() : i );
// Split the type and subtype.
i = header.indexOf( '/' );
i = i == -1 ? header.length() : i;
final var type = header.substring( 0, i );
final var subtype = header.substring( i + 1 );
mediaType[ 0 ] = MediaType.valueFrom( type, subtype );
}
} );
} catch( final Exception ex ) {
// TODO: Inform the user?
}
return mediaType[ 0 ];
}
}
Upvotes: 0
Reputation: 698
To get content type from response you can use ContentType class.
HttpEntity entity = response.getEntity();
ContentType contentType;
if (entity != null)
contentType = ContentType.get(entity);
Using this class you can easily extract mime type:
String mimeType = contentType.getMimeType();
or charset:
Charset charset = contentType.getCharset();
Upvotes: 34
Reputation: 1763
A "Content-type" HTTP header should give you mime type information:
Header contentType = response.getFirstHeader("Content-Type");
or as
Header contentType = response.getEntity().getContentType();
Then you can extract mime type itself as the content-type may include encoding as well.
String mimeType = contentType.getValue().split(";")[0].trim();
Of course, don't forget about null-check before getting value of the header (in case the content-type header is not sent by server).
Upvotes: 20