zope.mimetype

Latest release Supported Python versions https://travis-ci.org/zopefoundation/zope.mimetype.svg?branch=master https://coveralls.io/repos/github/zopefoundation/zope.mimetype/badge.svg?branch=master Documentation Status

This package provides a way to work with MIME content types. There are several interfaces defined here, many of which are used primarily to look things up based on different bits of information.

See complete documentation at https://zopemimetype.readthedocs.io/en/latest/

Introduction and Basics

The Zope MIME Infrastructure

This package provides a way to work with MIME content types. There are several interfaces defined here, many of which are used primarily to look things up based on different bits of information.

The basic idea behind this is that content objects should provide an interface based on the actual content type they implement. For example, objects that represent text/xml or application/xml documents should be marked mark with the IContentTypeXml interface. This can allow additional views to be registered based on the content type, or subscribers may be registered to perform other actions based on the content type.

One aspect of the content type that’s important for all documents is that the content type interface determines whether the object data is interpreted as an encoded text document. Encoded text documents, in particular, can be decoded to obtain a single Unicode string. The content type intefaces for encoded text must derive from IContentTypeEncoded. (All content type interfaces derive from IContentType and directly provide IContentTypeInterface.)

The default configuration provides direct support for a variety of common document types found in office environments.

Supported lookups

Several different queries are supported by this package:

  • Given a MIME type expressed as a string, the associated interface, if any, can be retrieved using:

    # `mimeType` is the MIME type as a string
    interface = queryUtility(IContentTypeInterface, mimeType)
    
  • Given a charset name, the associated ICodec instance can be retrieved using:

    # `charsetName` is the charset name as a string
    codec = queryUtility(ICharsetCodec, charsetName)
    
  • Given a codec, the preferred charset name can be retrieved using:

    # `codec` is an `ICodec` instance:
    charsetName = getUtility(ICodecPreferredCharset, codec.name).name
    
  • Given any combination of a suggested file name, file data, and content type header, a guess at a reasonable MIME type can be made using:

    # `filename` is a suggested file name, or None
    # `data` is uploaded data, or None
    # `content_type` is a Content-Type header value, or None
    #
    mimeType = getUtility(IMimeTypeGetter)(
        name=filename, data=data, content_type=content_type)
    
  • Given any combination of a suggested file name, file data, and content type header, a guess at a reasonable charset name can be made using:

    # `filename` is a suggested file name, or None
    # `data` is uploaded data, or None
    # `content_type` is a Content-Type header value, or None
    #
    charsetName = getUtility(ICharsetGetter)(
        name=filename, data=data, content_type=content_type)
    

Retrieving Content Type Information

MIME Types

We’ll start by initializing the interfaces and registrations for the content type interfaces. This is normally done via ZCML.

>>> from zope.mimetype import mtypes
>>> mtypes.setup()

A utility is used to retrieve MIME types.

>>> from zope import component
>>> from zope.mimetype import typegetter
>>> from zope.mimetype.interfaces import IMimeTypeGetter
>>> component.provideUtility(typegetter.smartMimeTypeGuesser,
...                          provides=IMimeTypeGetter)
>>> mime_getter = component.getUtility(IMimeTypeGetter)

To map a particular file name, file contents, and content type to a MIME type.

>>> mime_getter(name='file.txt', data='A text file.',
...             content_type='text/plain')
'text/plain'

In the default implementation if not enough information is given to discern a MIME type, None is returned.

>>> mime_getter() is None
True

Character Sets

A utility is also used to retrieve character sets (charsets).

>>> from zope.mimetype.interfaces import ICharsetGetter
>>> component.provideUtility(typegetter.charsetGetter,
...                          provides=ICharsetGetter)
>>> charset_getter = component.getUtility(ICharsetGetter)

To map a particular file name, file contents, and content type to a charset.

>>> charset_getter(name='file.txt', data='This is a text file.',
...                content_type='text/plain;charset=ascii')
'ascii'

In the default implementation if not enough information is given to discern a charset, None is returned.

>>> charset_getter() is None
True

Finding Interfaces

Given a MIME type we need to be able to find the appropriate interface.

>>> from zope.mimetype.interfaces import IContentTypeInterface
>>> component.getUtility(IContentTypeInterface, name=u'text/plain')
<InterfaceClass zope.mimetype.mtypes.IContentTypeTextPlain>

It is also possible to enumerate all content type interfaces.

>>> utilities = list(component.getUtilitiesFor(IContentTypeInterface))

If you want to find an interface from a MIME string, you can use the utilityies.

>>> component.getUtility(IContentTypeInterface, name='text/plain')
<InterfaceClass zope.mimetype.mtypes.IContentTypeTextPlain>

Codec handling

We can create codecs programatically. Codecs are registered as utilities for ICodec with the name of their python codec.

>>> from zope import component
>>> from zope.mimetype.interfaces import ICodec
>>> from zope.mimetype.codec import addCodec
>>> sorted(component.getUtilitiesFor(ICodec))
[]
>>> addCodec('iso8859-1', 'Western (ISO-8859-1)')
>>> codec = component.getUtility(ICodec, name='iso8859-1')
>>> codec
<zope.mimetype.codec.Codec ...>
>>> codec.name
'iso8859-1'
>>> addCodec('utf-8', 'Unicode (UTF-8)')
>>> codec2 = component.getUtility(ICodec, name='utf-8')

We can programmatically add charsets to a given codec. This registers each charset as a named utility for ICharset. It also registers the codec as a utility for ICharsetCodec with the name of the charset.

>>> from zope.mimetype.codec import addCharset
>>> from zope.mimetype.interfaces import ICharset, ICharsetCodec
>>> sorted(component.getUtilitiesFor(ICharset))
[]
>>> sorted(component.getUtilitiesFor(ICharsetCodec))
[]
>>> addCharset(codec.name, 'latin1')
>>> charset = component.getUtility(ICharset, name='latin1')
>>> charset
<zope.mimetype.codec.Charset ...>
>>> charset.name
'latin1'
>>> component.getUtility(ICharsetCodec, name='latin1') is codec
True

When adding a charset we can state that we want that charset to be the preferred charset for its codec.

>>> addCharset(codec.name, 'iso8859-1', preferred=True)
>>> addCharset(codec2.name, 'utf-8', preferred=True)

A codec can have at most one preferred charset.

>>> addCharset(codec.name, 'test', preferred=True)
Traceback (most recent call last):
...
ValueError: Codec already has a preferred charset.

Preferred charsets are registered as utilities for ICodecPreferredCharset under the name of the python codec.

>>> from zope.mimetype.interfaces import ICodecPreferredCharset
>>> preferred = component.getUtility(ICodecPreferredCharset, name='iso8859-1')
>>> preferred
<zope.mimetype.codec.Charset ...>
>>> preferred.name
'iso8859-1'
>>> sorted(component.getUtilitiesFor(ICodecPreferredCharset))
[(u'iso8859-1', <zope.mimetype.codec.Charset ...>),
 (u'utf-8', <zope.mimetype.codec.Charset ...>)]

We can look up a codec by the name of its charset:

>>> component.getUtility(ICharsetCodec, name='latin1') is codec
True
>>> component.getUtility(ICharsetCodec, name='utf-8') is codec2
True

Or we can look up all codecs:

>>> sorted(component.getUtilitiesFor(ICharsetCodec))
[(u'iso8859-1', <zope.mimetype.codec.Codec ...>),
 (u'latin1', <zope.mimetype.codec.Codec ...>),
 (u'test', <zope.mimetype.codec.Codec ...>),
 (u'utf-8', <zope.mimetype.codec.Codec ...>)]

Constraint Functions for Interfaces

The zope.mimetype.interfaces module defines interfaces that use some helper functions to define constraints on the accepted data. These helpers are used to determine whether values conform to the what’s allowed for parts of a MIME type specification and other parts of a Content-Type header as specified in RFC 2045.

Single Token

The first is the simplest: the tokenConstraint() function returns True if the ASCII string it is passed conforms to the token production in section 5.1 of the RFC. Let’s import the function:

>>> from zope.mimetype.interfaces import tokenConstraint

Typical token are the major and minor parts of the MIME type and the parameter names for the Content-Type header. The function should return True for these values:

>>> tokenConstraint("text")
True
>>> tokenConstraint("plain")
True
>>> tokenConstraint("charset")
True

The function should also return True for unusual but otherwise normal token that may be used in some situations:

>>> tokenConstraint("not-your-fathers-token")
True

It must also allow extension tokens and vendor-specific tokens:

>>> tokenConstraint("x-magic")
True

>>> tokenConstraint("vnd.zope.special-data")
True

Since we expect input handlers to normalize values to lower case, upper case text is not allowed:

>>> tokenConstraint("Text")
False

Non-ASCII text is also not allowed:

>>> tokenConstraint("\x80")
False
>>> tokenConstraint("\xC8")
False
>>> tokenConstraint("\xFF")
False

Note that lots of characters are allowed in tokens, and there are no constraints that the token “look like” something a person would want to read:

>>> tokenConstraint(".-.-.-.")
True

Other characters are disallowed, however, including all forms of whitespace:

>>> tokenConstraint("foo bar")
False
>>> tokenConstraint("foo\tbar")
False
>>> tokenConstraint("foo\nbar")
False
>>> tokenConstraint("foo\rbar")
False
>>> tokenConstraint("foo\x7Fbar")
False

Whitespace before or after the token is not accepted either:

>>> tokenConstraint(" text")
False
>>> tokenConstraint("plain ")
False

Other disallowed characters are defined in the tspecials production from the RFC (also in section 5.1):

>>> tokenConstraint("(")
False
>>> tokenConstraint(")")
False
>>> tokenConstraint("<")
False
>>> tokenConstraint(">")
False
>>> tokenConstraint("@")
False
>>> tokenConstraint(",")
False
>>> tokenConstraint(";")
False
>>> tokenConstraint(":")
False
>>> tokenConstraint("\\")
False
>>> tokenConstraint('"')
False
>>> tokenConstraint("/")
False
>>> tokenConstraint("[")
False
>>> tokenConstraint("]")
False
>>> tokenConstraint("?")
False
>>> tokenConstraint("=")
False

A token must contain at least one character, so tokenConstraint() returns false for an empty string:

>>> tokenConstraint("")
False

MIME Type

A MIME type is specified using two tokens separated by a slash; whitespace between the tokens and the slash must be normalized away in the input handler.

The mimeTypeConstraint() function is available to test a normalized MIME type value; let’s import that function now:

>>> from zope.mimetype.interfaces import mimeTypeConstraint

Let’s test some common MIME types to make sure the function isn’t obviously insane:

>>> mimeTypeConstraint("text/plain")
True
>>> mimeTypeConstraint("application/xml")
True
>>> mimeTypeConstraint("image/svg+xml")
True

If parts of the MIME type are missing, it isn’t accepted:

>>> mimeTypeConstraint("text")
False
>>> mimeTypeConstraint("text/")
False
>>> mimeTypeConstraint("/plain")
False

As for individual tokens, whitespace is not allowed:

>>> mimeTypeConstraint("foo bar/plain")
False
>>> mimeTypeConstraint("text/foo bar")
False

Whitespace is not accepted around the slash either:

>>> mimeTypeConstraint("text /plain")
False
>>> mimeTypeConstraint("text/ plain")
False

Surrounding whitespace is also not accepted:

>>> mimeTypeConstraint(" text/plain")
False
>>> mimeTypeConstraint("text/plain ")
False

Minimal IContentInfo Implementation

The zope.mimetype.contentinfo module provides a minimal IContentInfo implementation that adds no information to what’s provided by a content object. This represents the most conservative content-type policy that might be useful.

Let’s take a look at how this operates by creating a couple of concrete content-type interfaces:

>>> from zope.mimetype import interfaces

>>> class ITextPlain(interfaces.IContentTypeEncoded):
...     """text/plain"""

>>> class IApplicationOctetStream(interfaces.IContentType):
...     """application/octet-stream"""

Now, we’ll create a minimal content object that provide the necessary information:

>>> import zope.interface

>>> @zope.interface.implementer(interfaces.IContentTypeAware)
... class Content(object):
...     def __init__(self, mimeType, charset=None):
...         self.mimeType = mimeType
...         self.parameters = {}
...         if charset:
...             self.parameters["charset"] = charset

We can now create examples of both encoded and non-encoded content:

>>> encoded = Content("text/plain", "utf-8")
>>> zope.interface.alsoProvides(encoded, ITextPlain)

>>> unencoded = Content("application/octet-stream")
>>> zope.interface.alsoProvides(unencoded, IApplicationOctetStream)

The minimal IContentInfo implementation only exposes the information available to it from the base content object. Let’s take a look at the unencoded content first:

>>> from zope.mimetype import contentinfo
>>> ci = contentinfo.ContentInfo(unencoded)
>>> ci.effectiveMimeType
'application/octet-stream'
>>> ci.effectiveParameters
{}
>>> ci.contentType
'application/octet-stream'

For unencoded content, there is never a codec:

>>> print(ci.getCodec())
None

It is also disallowed to try decoding such content:

>>> ci.decode("foo")
Traceback (most recent call last):
...
ValueError: no matching codec found

Attemping to decode data using an uncoded object causes an exception to be raised:

>>> print(ci.decode("data"))
Traceback (most recent call last):
...
ValueError: no matching codec found

If we try this with encoded data, we get somewhat different behavior:

>>> ci = contentinfo.ContentInfo(encoded)
>>> ci.effectiveMimeType
'text/plain'
>>> ci.effectiveParameters
{'charset': 'utf-8'}
>>> ci.contentType
'text/plain;charset=utf-8'

The IContentInfo.getCodec() and IContentInfo.decode() methods can be used to handle encoded data using the encoding indicated by the charset parameter. Let’s store some UTF-8 data in a variable:

>>> utf8_data = b"\xAB\xBB".decode("iso-8859-1").encode("utf-8")
>>> utf8_data
'\xc2\xab\xc2\xbb'

We want to be able to decode the data using the IContentInfo object. Let’s try getting the corresponding ICodec object using IContentInfo.getCodec():

>>> codec = ci.getCodec()
Traceback (most recent call last):
...
ValueError: unsupported charset: 'utf-8'

So, we can’t proceed without some further preparation. What we need is to register an ICharset for UTF-8. The ICharset will need a reference (by name) to a ICodec for UTF-8. So let’s create those objects and register them:

>>> import codecs
>>> from zope.mimetype.i18n import _

>>> @zope.interface.implementer(interfaces.ICodec)
... class Utf8Codec(object):
...
...     name = "utf-8"
...     title = _("UTF-8")
...
...     def __init__(self):
...         ( self.encode,
...           self.decode,
...           self.reader,
...           self.writer
...           ) = codecs.lookup(self.name)

>>> utf8_codec = Utf8Codec()

>>> @zope.interface.implementer(interfaces.ICharset)
... class Utf8Charset(object):
...
...     name = utf8_codec.name
...     encoding = name

>>> utf8_charset = Utf8Charset()

>>> import zope.component

>>> zope.component.provideUtility(
...     utf8_codec, interfaces.ICodec, utf8_codec.name)
>>> zope.component.provideUtility(
...     utf8_charset, interfaces.ICharset, utf8_charset.name)

Now that that’s been initialized, let’s try getting the codec again:

>>> codec = ci.getCodec()
>>> codec.name
'utf-8'

>>> codec.decode(utf8_data)
(u'\xab\xbb', 4)

We can now check that the decode() method of the IContentInfo will decode the entire data, returning the Unicode representation of the text:

>>> ci.decode(utf8_data)
u'\xab\xbb'

Another possibilty, of course, is that you have content that you know is encoded text of some sort, but you don’t actually know what encoding it’s in:

>>> encoded2 = Content("text/plain")
>>> zope.interface.alsoProvides(encoded2, ITextPlain)

>>> ci = contentinfo.ContentInfo(encoded2)
>>> ci.effectiveMimeType
'text/plain'
>>> ci.effectiveParameters
{}
>>> ci.contentType
'text/plain'

>>> ci.getCodec()
Traceback (most recent call last):
...
ValueError: charset not known

It’s also possible that the initial content type information for an object is incorrect for some reason. If the browser provides a content type of “text/plain; charset=utf-8”, the content will be seen as encoded. A user correcting this content type using UI elements can cause the content to be considered un-encoded. At this point, there should no longer be a charset parameter to the content type, and the content info object should reflect this, though the previous encoding information will be retained in case the content type should be changed to an encoded type in the future.

Let’s see how this behavior will be exhibited in this API. We’ll start by creating some encoded content:

>>> content = Content("text/plain", "utf-8")
>>> zope.interface.alsoProvides(content, ITextPlain)

We can see that the encoding information is included in the effective MIME type information provided by the content-info object:

>>> ci = contentinfo.ContentInfo(content)
>>> ci.effectiveMimeType
'text/plain'
>>> ci.effectiveParameters
{'charset': 'utf-8'}

We now change the content type information for the object:

>>> ifaces = zope.interface.directlyProvidedBy(content)
>>> ifaces -= ITextPlain
>>> ifaces += IApplicationOctetStream
>>> zope.interface.directlyProvides(content, *ifaces)
>>> content.mimeType = 'application/octet-stream'

At this point, a content type object would provide different information:

>>> ci = contentinfo.ContentInfo(content)
>>> ci.effectiveMimeType
'application/octet-stream'
>>> ci.effectiveParameters
{}

The underlying content type parameters still contain the original encoding information, however:

>>> content.parameters
{'charset': 'utf-8'}

Events and content-type changes

The IContentTypeChangedEvent is fired whenever an object’s IContentTypeInterface is changed. This includes the cases when a content type interface is applied to an object that doesn’t have one, and when the content type interface is removed from an object.

Let’s start the demonstration by defining a subscriber for the event that simply prints out the information from the event object:

>>> def handler(event):
...     print("changed content type interface:")
...     print("  from:", event.oldContentType)
...     print("    to:", event.newContentType)

We’ll also define a simple content object:

>>> import zope.interface

>>> class IContent(zope.interface.Interface):
...     pass

>>> @zope.interface.implementer(IContent)
... class Content(object):
...     def __str__(self):
...         return "<MyContent>"

>>> obj = Content()

We’ll also need a couple of content type interfaces:

>>> from zope.mimetype import interfaces

>>> class ITextPlain(interfaces.IContentTypeEncoded):
...     """text/plain"""
>>> ITextPlain.setTaggedValue("mimeTypes", ["text/plain"])
>>> ITextPlain.setTaggedValue("extensions", [".txt"])
>>> zope.interface.directlyProvides(
...     ITextPlain, interfaces.IContentTypeInterface)

>>> class IOctetStream(interfaces.IContentType):
...     """application/octet-stream"""
>>> IOctetStream.setTaggedValue("mimeTypes", ["application/octet-stream"])
>>> IOctetStream.setTaggedValue("extensions", [".bin"])
>>> zope.interface.directlyProvides(
...     IOctetStream, interfaces.IContentTypeInterface)

Let’s register our subscriber:

>>> import zope.component
>>> import zope.component.interfaces
>>> zope.component.provideHandler(
...     handler,
...     (zope.component.interfaces.IObjectEvent,))

Changing the content type interface on an object is handled by the zope.mimetype.event.changeContentType() function. Let’s import that module and demonstrate that the expected event is fired appropriately:

>>> from zope.mimetype import event

Since the object currently has no content type interface, “removing” the interface does not affect the object and the event is not fired:

>>> event.changeContentType(obj, None)

Setting a content type interface on an object that doesn’t have one will cause the event to be fired, with the oldContentType attribute on the event set to None:

>>> event.changeContentType(obj, ITextPlain)
changed content type interface:
  from: None
    to: <InterfaceClass __builtin__.ITextPlain>

Calling the changeContentType() function again with the same “new” content type interface causes no change, so the event is not fired again:

>>> event.changeContentType(obj, ITextPlain)

Providing a new interface does cause the event to be fired again:

>>> event.changeContentType(obj, IOctetStream)
changed content type interface:
  from: <InterfaceClass __builtin__.ITextPlain>
    to: <InterfaceClass __builtin__.IOctetStream>

Similarly, removing the content type interface triggers the event as well:

>>> event.changeContentType(obj, None)
changed content type interface:
  from: <InterfaceClass __builtin__.IOctetStream>
    to: None

MIME type and character set extraction

The zope.mimetype.typegetter module provides a selection of MIME type extractors (implementations of zope.mimetype.interfaces.IMimeTypeGetter) and charset extractors (implementations of zope.mimetype.interfaces.ICharsetGetter). These may be used to determine what the MIME type and character set for uploaded data should be.

These two interfaces represent the site policy regarding interpreting upload data in the face of missing or inaccurate input.

Let’s go ahead and import the module:

>>> from zope.mimetype import typegetter

MIME types

There are a number of interesting MIME-type extractors:

mimeTypeGetter()
A minimal extractor that never attempts to guess.
mimeTypeGuesser()
An extractor that tries to guess the content type based on the name and data if the input contains no content type information.
smartMimeTypeGuesser()
An extractor that checks the content for a variety of constructs to try and refine the results of the mimeTypeGuesser(). This is able to do things like check for XHTML that’s labelled as HTML in upload data.

mimeTypeGetter()

We’ll start with the simplest, which does no content-based guessing at all, but uses the information provided by the browser directly. If the browser did not provide any content-type information, or if it cannot be parsed, the extractor simply asserts a “safe” MIME type of application/octet-stream. (The rationale for selecting this type is that since there’s really nothing productive that can be done with it other than download it, it’s impossible to mis-interpret the data.)

When there’s no information at all about the content, the extractor returns None:

>>> print(typegetter.mimeTypeGetter())
None

Providing only the upload filename or data, or both, still produces None, since no guessing is being done:

>>> print(typegetter.mimeTypeGetter(name="file.html"))
None

>>> print(typegetter.mimeTypeGetter(data=b"<html>...</html>"))
None

>>> print(typegetter.mimeTypeGetter(
...     name="file.html", data=b"<html>...</html>"))
None

If a content type header is available for the input, that is used since that represents explicit input from outside the application server. The major and minor parts of the content type are extracted and returned as a single string:

>>> typegetter.mimeTypeGetter(content_type="text/plain")
'text/plain'

>>> typegetter.mimeTypeGetter(content_type="text/plain; charset=utf-8")
'text/plain'

If the content-type information is provided but malformed (not in conformance with RFC 2822), it is ignored, since the intent cannot be reliably guessed:

>>> print(typegetter.mimeTypeGetter(content_type="foo bar"))
None

This combines with ignoring the other values that may be provided as expected:

>>> print(typegetter.mimeTypeGetter(
...     name="file.html", data=b"<html>...</html>", content_type="foo bar"))
None

mimeTypeGuesser()

A more elaborate extractor that tries to work around completely missing information can be found as the mimeTypeGuesser() function. This function will only guess if there is no usable content type information in the input. This extractor can be thought of as having the following pseudo-code:

def mimeTypeGuesser(name=None, data=None, content_type=None):
    type = mimeTypeGetter(name=name, data=data, content_type=content_type)
    if type is None:
        type = guess the content type
    return type

Let’s see how this affects the results we saw earlier. When there’s no input to use, we still get None:

>>> print(typegetter.mimeTypeGuesser())
None

Providing only the upload filename or data, or both, now produces a non-None guess for common content types:

>>> typegetter.mimeTypeGuesser(name="file.html")
'text/html'

>>> typegetter.mimeTypeGuesser(data=b"<html>...</html>")
'text/html'

>>> typegetter.mimeTypeGuesser(name="file.html", data=b"<html>...</html>")
'text/html'

Note that if the filename and data provided separately produce different MIME types, the result of providing both will be one of those types, but which is unspecified:

>>> mt_1 = typegetter.mimeTypeGuesser(name="file.html")
>>> mt_1
'text/html'

>>> mt_2 = typegetter.mimeTypeGuesser(data=b"<?xml version='1.0'?>...")
>>> mt_2
'text/xml'

>>> mt = typegetter.mimeTypeGuesser(
...     data=b"<?xml version='1.0'?>...", name="file.html")
>>> mt in (mt_1, mt_2)
True

If a content type header is available for the input, that is used in the same way as for the mimeTypeGetter() function:

>>> typegetter.mimeTypeGuesser(content_type="text/plain")
'text/plain'

>>> typegetter.mimeTypeGuesser(content_type="text/plain; charset=utf-8")
'text/plain'

If the content-type information is provided but malformed, it is ignored:

>>> print(typegetter.mimeTypeGetter(content_type="foo bar"))
None

When combined with values for the filename or content data, those are still used to provide reasonable guesses for the content type:

>>> typegetter.mimeTypeGuesser(name="file.html", content_type="foo bar")
'text/html'

>>> typegetter.mimeTypeGuesser(
...     data=b"<html>...</html>", content_type="foo bar")
'text/html'

Information from a parsable content-type is still used even if a guess from the data or filename would provide a different or more-refined result:

>>> typegetter.mimeTypeGuesser(
...     data=b"GIF89a...", content_type="application/octet-stream")
'application/octet-stream'

smartMimeTypeGuesser()

The smartMimeTypeGuesser() function applies more knowledge to the process of determining the MIME-type to use. Essentially, it takes the result of the mimeTypeGuesser() function and attempts to refine the content-type based on various heuristics.

We still see the basic behavior that no input produces None:

>>> print(typegetter.smartMimeTypeGuesser())
None

An unparsable content-type is still ignored:

>>> print(typegetter.smartMimeTypeGuesser(content_type="foo bar"))
None

The interpretation of uploaded data will be different in at least some interesting cases. For instance, the mimeTypeGuesser() function provides these results for some XHTML input data:

>>> typegetter.mimeTypeGuesser(
...     data=b"<?xml version='1.0' encoding='utf-8'?><html>...</html>",
...     name="file.html")
'text/html'

The smart extractor is able to refine this into more usable data:

>>> typegetter.smartMimeTypeGuesser(
...     data=b"<?xml version='1.0' encoding='utf-8'?>...",
...     name="file.html")
'application/xhtml+xml'

In this case, the smart extractor has refined the information determined from the filename using information from the uploaded data. The specific approach taken by the extractor is not part of the interface, however.

charsetGetter()

If you’re interested in the character set of textual data, you can use the charsetGetter function (which can also be registered as the ICharsetGetter utility):

The simplest case is when the character set is already specified in the content type.

>>> typegetter.charsetGetter(content_type='text/plain; charset=mambo-42')
'mambo-42'

Note that the charset name is lowercased, because all the default ICharset and ICharsetCodec utilities are registered for lowercase names.

>>> typegetter.charsetGetter(content_type='text/plain; charset=UTF-8')
'utf-8'

If it isn’t, charsetGetter can try to guess by looking at actual data

>>> typegetter.charsetGetter(content_type='text/plain', data=b'just text')
'ascii'
>>> typegetter.charsetGetter(content_type='text/plain', data=b'\xe2\x98\xba')
'utf-8'
>>> import codecs
>>> typegetter.charsetGetter(data=codecs.BOM_UTF16_BE + b'\x12\x34')
'utf-16be'
>>> typegetter.charsetGetter(data=codecs.BOM_UTF16_LE + b'\x12\x34')
'utf-16le'

If the character set cannot be determined, charsetGetter returns None.

>>> typegetter.charsetGetter(content_type='text/plain', data=b'\xff')
>>> typegetter.charsetGetter()

Source for MIME type interfaces

Some sample interfaces have been created in the zope.mimetype.tests module for use in this test. Let’s import them:

>>> from zope.mimetype.tests import (
...     ISampleContentTypeOne, ISampleContentTypeTwo)

The source should only include IContentTypeInterface interfaces that have been registered. Let’s register one of these two interfaces so we can test this:

>>> import zope.component
>>> from zope.mimetype.interfaces import IContentTypeInterface

>>> zope.component.provideUtility(
...     ISampleContentTypeOne, IContentTypeInterface, name="type/one")

>>> zope.component.provideUtility(
...     ISampleContentTypeOne, IContentTypeInterface, name="type/two")

We should see that these interfaces are included in the source:

>>> from zope.mimetype import source

>>> s = source.ContentTypeSource()

>>> ISampleContentTypeOne in s
True
>>> ISampleContentTypeTwo in s
False

Interfaces that do not implement the IContentTypeInterface are not included in the source:

>>> import zope.interface
>>> class ISomethingElse(zope.interface.Interface):
...    """This isn't a content type interface."""

>>> ISomethingElse in s
False

The source is iterable, so we can get a list of the values:

>>> values = list(s)

>>> len(values)
1
>>> values[0] is ISampleContentTypeOne
True

We can get terms for the allowed values:

>>> terms = source.ContentTypeTerms(s, None)
>>> t = terms.getTerm(ISampleContentTypeOne)
>>> terms.getValue(t.token) is ISampleContentTypeOne
True

Interfaces that are not in the source cause an error when a term is requested:

>>> terms.getTerm(ISomethingElse)
Traceback (most recent call last):
...
LookupError: value is not an element in the source

The term provides a token based on the module name of the interface:

>>> t.token
'zope.mimetype.tests.ISampleContentTypeOne'

The term also provides the title based on the “title” tagged value from the interface:

>>> t.title
u'Type One'

Each interface provides a list of MIME types with which the interface is associated. The term object provides access to this list:

>>> t.mimeTypes
['type/one', 'type/foo']

A list of common extensions for files of this type is also available, though it may be empty:

>>> t.extensions
[]

The term’s value, of course, is the interface passed in:

>>> t.value is ISampleContentTypeOne
True

This extended term API is defined by the IContentTypeTerm interface:

>>> from zope.mimetype.interfaces import IContentTypeTerm
>>> IContentTypeTerm.providedBy(t)
True

The value can also be retrieved using the getValue() method:

>>> iface = terms.getValue('zope.mimetype.tests.ISampleContentTypeOne')
>>> iface is ISampleContentTypeOne
True

Attempting to retrieve an interface that isn’t in the source using the terms object generates a LookupError:

>>> terms.getValue('zope.mimetype.tests.ISampleContentTypeTwo')
Traceback (most recent call last):
...
LookupError: token does not represent an element in the source

Attempting to look up a junk token also generates an error:

>>> terms.getValue('just.some.dotted.name.that.does.not.exist')
Traceback (most recent call last):
...
LookupError: could not import module for token

Widgets

TranslatableSourceSelectWidget

TranslatableSourceSelectWidget is a SourceSelectWidget that translates and sorts the choices.

We will borrow the boring set up code from the SourceSelectWidget test (source.txt in zope.formlib).

>>> import zope.interface
>>> import zope.component
>>> import zope.schema
>>> import zope.schema.interfaces
>>> @zope.interface.implementer(zope.schema.interfaces.IIterableSource)
... class SourceList(list):
...     pass
>>> import base64, binascii
>>> import zope.publisher.interfaces.browser
>>> from zope.browser.interfaces import ITerms
>>> from zope.schema.vocabulary import SimpleTerm
>>> @zope.interface.implementer(ITerms)
... class ListTerms:
...
...     def __init__(self, source, request):
...         pass # We don't actually need the source or the request :)
...
...     def getTerm(self, value):
...         title = value.decode() if isinstance(value, bytes) else value
...         try:
...             token = base64.b64encode(title.encode()).strip().decode()
...         except binascii.Error:
...             raise LookupError(token)
...         return SimpleTerm(value, token=token, title=title)
...
...     def getValue(self, token):
...         return token.decode('base64')
>>> zope.component.provideAdapter(
...     ListTerms,
...     (SourceList, zope.publisher.interfaces.browser.IBrowserRequest))
>>> dog = zope.schema.Choice(
...    __name__ = 'dog',
...    title=u"Dogs",
...    source=SourceList(['spot', 'bowser', 'prince', 'duchess', 'lassie']),
...    )
>>> dog = dog.bind(object())

Now that we have a field and a working source, we can construct and render a widget.

>>> from zope.mimetype.widget import TranslatableSourceSelectWidget
>>> from zope.publisher.browser import TestRequest
>>> request = TestRequest()
>>> widget = TranslatableSourceSelectWidget(
...     dog, dog.source, request)
>>> print(widget())
<div>
<div class="value">
<select id="field.dog" name="field.dog" size="5" >
<option value="Ym93c2Vy">bowser</option>
<option value="ZHVjaGVzcw==">duchess</option>
<option value="bGFzc2ll">lassie</option>
<option value="cHJpbmNl">prince</option>
<option value="c3BvdA==">spot</option>
</select>
</div>
<input name="field.dog-empty-marker" type="hidden" value="1" />
</div>

Note that the options are ordered alphabetically.

If the field is not required, we will also see a special choice labeled “(nothing selected)” at the top of the list

>>> dog.required = False
>>> print(widget())
<div>
<div class="value">
<select id="field.dog" name="field.dog" size="5" >
<option selected="selected" value="">(nothing selected)</option>
<option value="Ym93c2Vy">bowser</option>
<option value="ZHVjaGVzcw==">duchess</option>
<option value="bGFzc2ll">lassie</option>
<option value="cHJpbmNl">prince</option>
<option value="c3BvdA==">spot</option>
</select>
</div>
<input name="field.dog-empty-marker" type="hidden" value="1" />
</div>

Utilities

The utils module contains various helpers for working with data goverened by MIME content type information, as found in the HTTP Content-Type header: mime types and character sets.

The decode function takes a string and an IANA character set name and returns a unicode object decoded from the string, using the codec associated with the character set name. Errors will generally arise from the unicode conversion rather than the mapping of character set to codec, and will be LookupErrors (the character set did not cleanly convert to a codec that Python knows about) or UnicodeDecodeErrors (the string included characters that were not in the range of the codec associated with the character set).

>>> original = b'This is an o with a slash through it: \xb8.'
>>> charset = 'Latin-7' # Baltic Rim or iso-8859-13
>>> from zope.mimetype import utils
>>> utils.decode(original, charset)
u'This is an o with a slash through it: \xf8.'
>>> utils.decode(original, 'foo bar baz')
Traceback (most recent call last):
...
LookupError: unknown encoding: foo bar baz
>>> utils.decode(original, 'iso-ir-6') # alias for ASCII
... 
Traceback (most recent call last):
...
UnicodeDecodeError: 'ascii' codec can't decode...

Changes

2.4.0 (unreleased)

2.3.1 (2018-01-09)

  • Only try to register the browser stuff in the ZCA when zope.formlib is available as it breaks otherwise.

2.3.0 (2017-09-28)

  • Drop support for Python 3.3.
  • Move the dependencies on zope.browser, zope.publisher and zope.formlib (only needed to use the source and widget modules) into a new browser extra. See PR 8.

2.2.0 (2017-04-24)

  • Fix issue 6: typegetter.smartMimeTypeGuesser would raise TypeError on Python 3 when the data was bytes and the content_type was text/html.
  • Add support for Python 3.6.

2.1.0 (2016-08-09)

  • Add support for Python 3.5.
  • Drop support for Python 2.6.
  • Fix configuring the package via its included ZCML on Python 3.

2.0.0 (2014-12-24)

  • Add support for PyPy and PyPy3.
  • Add support for Python 3.4.
  • Restore the ability to write from zope.mimetype import types.
  • Make configure.zcml respect the renaming of the types module so that it can be loaded.

2.0.0a1 (2013-02-27)

  • Add support for Python 3.3.
  • Replace deprecated zope.component.adapts usage with equivalent zope.component.adapter decorator.
  • Replace deprecated zope.interface.implements usage with equivalent zope.interface.implementer decorator.
  • Rename zope.mimetype.types to zope.mimetype.mtypes.
  • Drop support for Python 2.4 and 2.5.

1.3.1 (2010-11-10)

  • No longer dependg on zope.app.form in configure.zcml by using zope.formlib instead, where the needed interfaces are living now.

1.3.0 (2010-06-26)

  • Add testing dependency on zope.component[test].
  • Use zope.formlib instead of zope.app.form.browser for select widget.
  • Conform to repository policy.

1.2.0 (2009-12-26)

  • Convert functional tests to unit tests and get rid of all extra test dependencies as a result.
  • Use the ITerms interface from zope.browser.
  • Declare missing dependencies, resolved direct dependency on zope.app.publisher.
  • Import content-type parser from zope.contenttype, adding a dependency on that package.

1.1.2 (2009-05-22)

  • No longer depend on zope.app.component.

1.1.1 (2009-04-03)

  • Fix wrong package version (version 1.1.0 was released as 0.4.0 at pypi but as 1.1dev at download.zope.org/distribution)
  • Fix author email and home page address.

1.1.0 (2007-11-01)

  • Package data update.
  • First public release.

1.0.0 (2007-??-??)

  • Initial release.

API Details

API Reference

zope.mimetype.interfaces

interfaces for mimetype package

interface ICharset[source]

Information about a charset

encoding

Encoding

The id of the encoding used for this charset.

name

Name

The charset name. This is what is used for the ‘charset’ parameter in content-type headers.

interface ICharsetCodec[source]

Marker interface for locating the codec for a given charset.

interface ICharsetGetter[source]

A utility that looks up a character set (charset).

__call__(name=None, data=None, content_type=None)

Look up a charset.

If a charset cannot be determined based on the input, this returns None.

interface ICodec[source]

Information about a codec.

name

Name

The name of the Python codec.

title

Title

The human-readable name of this codec.

writer(stream, errors='strict')

Construct a StramWriter object for this codec.

decode(input, errors='strict')

Decodes the input and returns a tuple (output, length consumed).

reader(stream, errors='strict')

Construct a StreamReader object for this codec.

encode(input, errors='strict')

Encodes the input and returns a tuple (output, length consumed).

interface ICodecPreferredCharset[source]

Marker interface for locating the preferred charset for a Codec.

interface ICodecSource[source]

Extends: zope.schema.interfaces.IIterableSource

Source for codecs.

interface ICodecTerm[source]

Extends: zope.schema.interfaces.ITitledTokenizedTerm

Extended term that describes a content type interface.

preferredCharset

Preferred Charset

Charset that should be used to represent the codec

interface IContentInfo[source]

Interface describing effective MIME type information.

When using MIME data from an object, an application should adapt the object to this interface to determine how it should be interpreted. This may be different from the information

getCodec()

Return an ICodec that should be used to decode/encode data.

This should return None if the object’s IContentType interface does not derive from IContentTypeEncoded.

If the content type is encoded and no encoding information is available in the effectiveParameters, this method may return None, or may provide a codec based on application policy.

If effectiveParameters indicates a specific charset, and no codec is registered to support that charset, ValueError will be raised.

contentType

Content type

The value of the Content-Type header, including both the MIME type and any parameters.

effectiveMimeType

Effective MIME type

MIME type that should be reported when downloading the document this IContentInfo object is for.

decode(s)

Return the decoding of s based on the effective encoding.

The effective encoding is determined by the return from the getCodec() method.

ValueError is raised if no codec can be found for the effective charset.

effectiveParameters

Effective parameters

Content-Type parameters that should be reported when downloading the document this IContentInfo object is for.

interface IContentType[source]

Marker interface for objects that represent content with a MIME type.

interface IContentTypeAware[source]

Interface for MIME content type information.

Objects that can provide content type information about the data they contain, such as file objects, should be adaptable to this interface.

mimeType

Mime Type

The mime type explicitly specified for the object that this MIME information describes, if any. May be None, or an ASCII MIME type string of the form major/minor.

parameters

Mime Type Parameters

The MIME type parameters (such as charset).

interface IContentTypeChangedEvent[source]

Extends: zope.interface.interfaces.IObjectEvent

The content type for an object has changed.

All changes of the IContentTypeInterface for an object are reported by this event, including the setting of an initial content type and the removal of the content type interface.

This event should only be used if the content type actually changes.

oldContentType

Content type interface before the change, if any, or None.

newContentType

Content type interface after the change, if any, or None.

interface IContentTypeEncoded[source]

Extends: zope.mimetype.interfaces.IContentType

Marker interface for content types that care about encoding.

This does not imply that encoding information is known for a specific object.

Content types that derive from IContentTypeEncoded support a content type parameter named ‘charset’, and that parameter is used to control encoding and decoding of the text.

For example, interfaces for text/* content types all derive from this base interface.

interface IContentTypeInterface[source]

Interface that describes a logical mime type.

Interfaces that provide this interfaces are content-type interfaces.

Most MIME types are described by the IANA MIME-type registry (http://www.iana.org/assignments/media-types/).

interface IContentTypeSource[source]

Extends: zope.schema.interfaces.ISource, zope.schema.interfaces.IIterableSource

Source for content types.

interface IContentTypeTerm[source]

Extends: zope.schema.interfaces.ITitledTokenizedTerm

Extended term that describes a content type interface.

mimeTypes

MIME types

List of MIME types represented by this interface; the first should be considered the preferred MIME type.

extensions

Extensions

Filename extensions commonly associated with this type of file.

interface IMimeTypeGetter[source]

A utility that looks up a MIME type string.

__call__(name=None, data=None, content_type=None)

Look up a MIME type.

If a MIME type cannot be determined based on the input, this returns None.

Parameters:data (bytes) – If given, the bytes data to get a MIME type for. This may be examined for clues about the type.
mimeTypeConstraint(value)[source]

Return True iff value is a syntactically legal MIME type.

tokenConstraint(value)[source]

Return True iff value is a syntactically legal RFC 2045 token.

zope.mimetype.codec

zope.mimetype.contentinfo

Default IContentInfo implementation.

class ContentInfo(context)[source]

Bases: object

Basic IContentInfo that provides information from an IContentTypeAware.

zope.mimetype.event

Implementation of and support for the IContentTypeChangedEvent.

changeContentType(object, newContentType)[source]

Set the content type interface for the object.

If this represents a change, an IContentTypeChangedEvent will be fired.

zope.mimetype.i18n

I18N support for the zope.mime package.

This defines a MessageFactory for the I18N domain for the zope.mimetype package. This is normally used with this import:

from i18n import MessageFactory as _

The factory is then used normally. Two examples:

text = _('some internationalized text')
text = _('helpful-descriptive-message-id', 'default text')

zope.mimetype.mtypes

Mime-Types management

zope.mimetype.source

Sources for IContentTypeInterface providers and codecs.

class CodecSource[source]

Bases: zope.mimetype.source.UtilitySource

Source of ICodec providers.

class CodecTerms(source, request)[source]

Bases: zope.mimetype.source.Terms

Utility to provide terms for codecs.

class ContentTypeSource[source]

Bases: zope.mimetype.source.UtilitySource

Source of IContentTypeInterface providers.

class ContentTypeTerms(source, request)[source]

Bases: zope.mimetype.source.Terms

Utility to provide terms for content type interfaces.

class Terms(source, request)[source]

Bases: object

Utility to provide terms for content type interfaces.

class UtilitySource[source]

Bases: object

Source of utilities providing a specific interface.

zope.mimetype.typegetter

charsetGetter(name=None, data=None, content_type=None)[source]

Default implementation of zope.mimetype.interfaces.ICharsetGetter.

mimeTypeGetter(name=None, data=None, content_type=None)[source]

A minimal extractor that never attempts to guess.

mimeTypeGuesser(name=None, data=None, content_type=None)[source]

An extractor that tries to guess the content type based on the name and data if the input contains no content type information.

smartMimeTypeGuesser(name=None, data=None, content_type=None)[source]

An extractor that checks the content for a variety of constructs to try and refine the results of the mimeTypeGuesser(). This is able to do things like check for XHTML that’s labelled as HTML in upload data.

zope.mimetype.utils

Utility helpers

decode(s, charset_name)[source]

given a string and a IANA character set name, decode string to unicode

zope.mimetype.widget

Widget that provides translation and sorting for an IIterableSource.

This widget translates the term titles and presents those in sorted order.

Properly, this should call on a language-specific collation routine, but we don’t currently have those. Also, it would need to deal with a partially-translated list of titles when translations are only available for some of the titles.

The implementation ignores these issues for now.

class TranslatableSourceDropdownWidget(field, source, request)[source]

Bases: zope.mimetype.widget.TranslatableSourceSelectWidget

class TranslatableSourceSelectWidget(field, source, request)[source]

Bases: zope.formlib.source.SourceSelectWidget

renderItemsWithValues(values)[source]

Render the list of possible values, with those found in values being marked as selected.

textForValue(term)[source]

Extract a string from the term.

The term must be a vocabulary tokenized term.

This can be overridden to support more complex term objects. The token is returned here since it’s the only thing known to be a string, or str()able.

zope.mimetype.zcml

interface ICharsetDirective[source]

Defines a charset in a codec.

Example:

<charset name="iso8859-1" preferred="True" />
<charset name="latin1" />
preferred

Preferred

Is this is the preferred charset for the encoding.

name

Name

The name of the Python codec.

interface ICodecDirective[source]

Defines a codec.

Example:

<zope:codec name="iso8859-1" title="Western (ISO-8859-1)">
   ...
</zope:codec>
title

Title

The human-readable name for this codec.

name

Name

The name of the Python codec.

interface IMimeTypesDirective[source]

Request loading of a MIME type definition table.

Example:

<zope:mimeDefinitions file='types.csv'/>
module

Module

Module which contains the interfaces referenced from the CSV file.

file

File

Path of the CSV file to load registrations from.

Indices and tables