Arvind
Arvind

Reputation: 6474

How to find HTTP Media Type (MIME type) from response?

While issuing a GET request using Apache HTTP Client v4, how do I obtain the response media type (formally MIME type)?

Using Apache HTTP Client v3, the MIME type was obtained with:

 String mimeType = response.getMimeType();

How do I get the media type using Apache HTTP Client v4?

Upvotes: 12

Views: 35734

Answers (3)

Dave Jarvis
Dave Jarvis

Reputation: 31161

Note that Apache's ContentType treats SVG as application/svg+xml, rather than the IANA-defined image/svg+xml, which appears to be an incorrect categorization.

Although this answer doesn't answer the question, directly, it provides an alternative to using Apache's HTTP Client by using Java's HTTP Client. Additionally, the example code:

  • makes a HEAD request, rather than a GET request, which is a lighter operation;
  • introduces a 5-second timeout on HTTP requests, which may need to be longer, depending on your scenario;
  • attempts to add strong typing to the media type by parsing content-types into a MediaType enum instead of a string; and
  • avoids the overhead of using a regular expression to parse a simple string; and
  • defines many more media types, especially images, that are not defined by Apache's ContentType.

Without further ado, here are a few Java source files that may prove helpful.

MediaType

The base enumeration encodes IANA media types. If you add more official encodings, please update this answer so that all may benefit. Note that R Markdown, R XML, and YAML are not defined, officially, so you may wish to remove them.

import static org.apache.commons.io.FilenameUtils.getExtension;

public enum MediaType {
  APP_JAVA_OBJECT(
    APPLICATION, "x-java-serialized-object"
  ),

  FONT_OTF( "otf" ),
  FONT_TTF( "ttf" ),

  IMAGE_APNG( "apng" ),
  IMAGE_ACES( "aces" ),
  IMAGE_AVCI( "avci" ),
  IMAGE_AVCS( "avcs" ),
  IMAGE_BMP( "bmp" ),
  IMAGE_CGM( "cgm" ),
  IMAGE_DICOM_RLE( "dicom_rle" ),
  IMAGE_EMF( "emf" ),
  IMAGE_EXAMPLE( "example" ),
  IMAGE_FITS( "fits" ),
  IMAGE_G3FAX( "g3fax" ),
  IMAGE_GIF( "gif" ),
  IMAGE_HEIC( "heic" ),
  IMAGE_HEIF( "heif" ),
  IMAGE_HEJ2K( "hej2k" ),
  IMAGE_HSJ2( "hsj2" ),
  IMAGE_X_ICON( "x-icon" ),
  IMAGE_JLS( "jls" ),
  IMAGE_JP2( "jp2" ),
  IMAGE_JPEG( "jpeg" ),
  IMAGE_JPH( "jph" ),
  IMAGE_JPHC( "jphc" ),
  IMAGE_JPM( "jpm" ),
  IMAGE_JPX( "jpx" ),
  IMAGE_JXR( "jxr" ),
  IMAGE_JXRA( "jxrA" ),
  IMAGE_JXRS( "jxrS" ),
  IMAGE_JXS( "jxs" ),
  IMAGE_JXSC( "jxsc" ),
  IMAGE_JXSI( "jxsi" ),
  IMAGE_JXSS( "jxss" ),
  IMAGE_KTX( "ktx" ),
  IMAGE_KTX2( "ktx2" ),
  IMAGE_NAPLPS( "naplps" ),
  IMAGE_PNG( "png" ),
  IMAGE_SVG_XML( "svg+xml" ),
  IMAGE_T38( "t38" ),
  IMAGE_TIFF( "tiff" ),
  IMAGE_WEBP( "webp" ),
  IMAGE_WMF( "wmf" ),

  TEXT_HTML( TEXT, "html" ),
  TEXT_MARKDOWN( TEXT, "markdown" ),
  TEXT_PLAIN( TEXT, "plain" ),
  TEXT_R_MARKDOWN( TEXT, "R+markdown" ),
  TEXT_R_XML( TEXT, "R+xml" ),
  TEXT_YAML( TEXT, "yaml" ),

  UNDEFINED( TypeName.UNDEFINED, "undefined" );

  /**
   * The IANA-defined types.
   */
  public enum TypeName {
    APPLICATION,
    IMAGE,
    TEXT,
    UNDEFINED
  }

  /**
   * The fully qualified IANA-defined media type.
   */
  private final String mMediaType;

  /**
   * The IANA-defined type name.
   */
  private final TypeName mTypeName;

  /**
   * The IANA-defined subtype name.
   */
  private final String mSubtype;

  /**
   * Constructs an instance using the default type name of "image".
   *
   * @param subtype The image subtype name.
   */
  MediaType( final String subtype ) {
    this( IMAGE, subtype );
  }

  /**
   * Constructs an instance using an IANA-defined type and subtype pair.
   *
   * @param typeName The media type's type name.
   * @param subtype  The media type's subtype name.
   */
  MediaType( final TypeName typeName, final String subtype ) {
    mTypeName = typeName;
    mSubtype = subtype;
    mMediaType = typeName.toString().toLowerCase() + '/' + subtype;
  }

  /**
   * Returns the {@link MediaType} associated with the given file.
   *
   * @param file Has a file name that may contain an extension associated with
   *             a known {@link MediaType}.
   * @return {@link MediaType#UNDEFINED} if the extension has not been
   * assigned, otherwise the {@link MediaType} associated with this
   * {@link File}'s file name extension.
   */
  public static MediaType valueFrom( final File file ) {
    return valueFrom( file.getName() );
  }

  /**
   * Returns the {@link MediaType} associated with the given file name.
   *
   * @param filename The file name that may contain an extension associated
   *                 with a known {@link MediaType}.
   * @return {@link MediaType#UNDEFINED} if the extension has not been
   * assigned, otherwise the {@link MediaType} associated with this
   * URL's file name extension.
   */
  public static MediaType valueFrom( final String filename ) {
    return getMediaType( getExtension( filename ) );
  }

  /**
   * Returns the {@link MediaType} for the given type and subtype names.
   *
   * @param type    The IANA-defined type name.
   * @param subtype The IANA-defined subtype name.
   * @return {@link MediaType#UNDEFINED} if there is no {@link MediaType} that
   * matches the given type and subtype names.
   */
  public static MediaType valueFrom(
    final String type, final String subtype ) {
    for( final var mediaType : MediaType.values() ) {
      if( mediaType.equals( type, subtype ) ) {
        return mediaType;
      }
    }

    return UNDEFINED;
  }

  /**
   * Answers whether the given type and subtype names equal this enumerated
   * value. This performs a case-insensitive comparison.
   *
   * @param type    The type name to compare against this {@link MediaType}.
   * @param subtype The subtype name to compare against this {@link MediaType}.
   * @return {@code true} when the type and subtype name match.
   */
  public boolean equals( final String type, final String subtype ) {
    return mTypeName.name().equalsIgnoreCase( type ) &&
      mSubtype.equalsIgnoreCase( subtype );
  }

  /**
   * Answers whether the given {@link TypeName} matches this type name.
   *
   * @param typeName The {@link TypeName} to compare against the internal value.
   * @return {@code true} if the given value is the same IANA-defined type name.
   */
  public boolean isType( final TypeName typeName ) {
    return mTypeName == typeName;
  }

  /**
   * Returns the IANA-defined type and sub-type.
   *
   * @return The unique media type identifier.
   */
  public String toString() {
    return mMediaType;
  }

  /**
   * Used by {@link MediaTypeExtensions} to initialize associations where the
   * subtype name and the file name extension have a 1:1 mapping.
   *
   * @return The IANA subtype value.
   */
  String getSubtype() {
    return mSubtype;
  }
}

MediaTypeExtensions

Different file name extensions map to various media types. The mapping of extensions to MediaType does not necessarily mean that the content matches the expected media type. Applications must take care to read the file headers to determine the actual media type.

enum MediaTypeExtensions {
  MEDIA_FONT_OTF( FONT_OTF ),
  MEDIA_FONT_TTF( FONT_TTF ),

  MEDIA_IMAGE_APNG( IMAGE_APNG ),
  MEDIA_IMAGE_BMP( IMAGE_BMP ),
  MEDIA_IMAGE_GIF( IMAGE_GIF ),
  MEDIA_IMAGE_ICO( IMAGE_X_ICON, of( "ico", "cur" ) ),
  MEDIA_IMAGE_JPEG( IMAGE_JPEG, of( "jpg", "jpeg", "jfif", "pjpeg", "pjp" ) ),
  MEDIA_IMAGE_PNG( IMAGE_PNG ),
  MEDIA_IMAGE_SVG( IMAGE_SVG_XML, of( "svg" ) ),
  MEDIA_IMAGE_TIFF( IMAGE_TIFF, of( "tif", "tiff" ) ),
  MEDIA_IMAGE_WEBP( IMAGE_WEBP ),

  MEDIA_TEXT_MARKDOWN( TEXT_MARKDOWN, of(
    "md", "markdown", "mdown", "mdtxt", "mdtext", "mdwn", "mkd", "mkdown",
    "mkdn" ) ),
  MEDIA_TEXT_PLAIN( TEXT_PLAIN, of( "asc", "ascii", "txt", "text", "utxt" ) ),
  MEDIA_TEXT_R_MARKDOWN( TEXT_R_MARKDOWN, of( "Rmd" ) ),
  MEDIA_TEXT_R_XML( TEXT_R_XML, of( "Rxml" ) ),
  MEDIA_TEXT_YAML( TEXT_YAML, of( "yaml", "yml" ) );

  private final MediaType mMediaType;
  private final Set<String> mExtensions;

  MediaTypeExtensions( final MediaType mediaType ) {
    this( mediaType, of( mediaType.getSubtype() ) );
  }

  MediaTypeExtensions(
    final MediaType mediaType, final Set<String> extensions ) {
    assert mediaType != null;
    assert extensions != null;
    assert !extensions.isEmpty();

    mMediaType = mediaType;
    mExtensions = extensions;
  }

  static MediaType getMediaType( final String extension ) {
    final var sanitized = sanitize( extension );

    for( final var mediaType : MediaTypeExtensions.values() ) {
      if( mediaType.isType( sanitized ) ) {
        return mediaType.getMediaType();
      }
    }

    return UNDEFINED;
  }

  private boolean isType( final String sanitized ) {
    for( final var extension : mExtensions ) {
      if( extension.equalsIgnoreCase( sanitized ) ) {
        return true;
      }
    }

    return false;
  }

  private static String sanitize( final String extension ) {
    return extension == null ? "" : extension.toLowerCase();
  }

  private MediaType getMediaType() {
    return mMediaType;
  }
}

HttpMediaType

Finally, we can write a tiny parser that converts the content-type header into a MediaType value. Note that the HttpClient API itself performs a case-sensitive comparison against the header name, so we cannot use methods like firstValue or allValues because we don't know whether the server will return "Content-Type" or "content-type". Strictly speaking, this appears to be a bug because RFC-2616 states that message headers are not case-sensitive.

public class HttpMediaType {

  private final static HttpClient HTTP_CLIENT = HttpClient
    .newBuilder()
    .connectTimeout( ofSeconds( 5 ) )
    .followRedirects( NORMAL )
    .build();

  /**
   * Performs an HTTP HEAD request to determine the media type based on the
   * Content-Type header returned from the server.
   *
   * @param uri Determine the media type for this resource.
   * @return The data type for the resource or {@link MediaType#UNDEFINED} if
   * unmapped.
   * @throws MalformedURLException The {@link URI} could not be converted to
   *                               a {@link URL}.
   */
  public static MediaType valueFrom( final URI uri )
    throws MalformedURLException {
    final var mediaType = new MediaType[]{UNDEFINED};

    try {
      final var request = HttpRequest
        .newBuilder( uri )
        .method( "HEAD", noBody() )
        .build();
      final var response = HTTP_CLIENT.send( request, discarding() );
      final var headers = response.headers();
      final var map = headers.map();

      map.forEach( ( key, values ) -> {
        if( "Content-Type".equalsIgnoreCase( key ) ) {
          var header = values.get( 0 );
          // Trim off the character encoding.
          var i = header.indexOf( ';' );
          header = header.substring( 0, i == -1 ? header.length() : i );

          // Split the type and subtype.
          i = header.indexOf( '/' );
          i = i == -1 ? header.length() : i;
          final var type = header.substring( 0, i );
          final var subtype = header.substring( i + 1 );

          mediaType[ 0 ] = MediaType.valueFrom( type, subtype );
        }
      } );
    } catch( final Exception ex ) {
      // TODO: Inform the user?
    }

    return mediaType[ 0 ];
  }
}

Upvotes: 0

peppered
peppered

Reputation: 698

To get content type from response you can use ContentType class.

HttpEntity entity = response.getEntity();
ContentType contentType;
if (entity != null) 
    contentType = ContentType.get(entity);

Using this class you can easily extract mime type:

String mimeType = contentType.getMimeType();

or charset:

Charset charset = contentType.getCharset();

Upvotes: 34

Vlad
Vlad

Reputation: 1763

A "Content-type" HTTP header should give you mime type information:

Header contentType = response.getFirstHeader("Content-Type");

or as

Header contentType = response.getEntity().getContentType();

Then you can extract mime type itself as the content-type may include encoding as well.

String mimeType = contentType.getValue().split(";")[0].trim();

Of course, don't forget about null-check before getting value of the header (in case the content-type header is not sent by server).

Upvotes: 20

Related Questions