Minimal IContentInfo Implementation

The zope.mimetype.contentinfo module provides a minimal IContentInfo implementation that adds no information to what’s provided by a content object. This represents the most conservative content-type policy that might be useful.

Let’s take a look at how this operates by creating a couple of concrete content-type interfaces:

>>> from zope.mimetype import interfaces

>>> class ITextPlain(interfaces.IContentTypeEncoded):
...     """text/plain"""

>>> class IApplicationOctetStream(interfaces.IContentType):
...     """application/octet-stream"""

Now, we’ll create a minimal content object that provide the necessary information:

>>> import zope.interface

>>> @zope.interface.implementer(interfaces.IContentTypeAware)
... class Content(object):
...     def __init__(self, mimeType, charset=None):
...         self.mimeType = mimeType
...         self.parameters = {}
...         if charset:
...             self.parameters["charset"] = charset

We can now create examples of both encoded and non-encoded content:

>>> encoded = Content("text/plain", "utf-8")
>>> zope.interface.alsoProvides(encoded, ITextPlain)

>>> unencoded = Content("application/octet-stream")
>>> zope.interface.alsoProvides(unencoded, IApplicationOctetStream)

The minimal IContentInfo implementation only exposes the information available to it from the base content object. Let’s take a look at the unencoded content first:

>>> from zope.mimetype import contentinfo
>>> ci = contentinfo.ContentInfo(unencoded)
>>> ci.effectiveMimeType
'application/octet-stream'
>>> ci.effectiveParameters
{}
>>> ci.contentType
'application/octet-stream'

For unencoded content, there is never a codec:

>>> print(ci.getCodec())
None

It is also disallowed to try decoding such content:

>>> ci.decode("foo")
Traceback (most recent call last):
...
ValueError: no matching codec found

Attemping to decode data using an uncoded object causes an exception to be raised:

>>> print(ci.decode("data"))
Traceback (most recent call last):
...
ValueError: no matching codec found

If we try this with encoded data, we get somewhat different behavior:

>>> ci = contentinfo.ContentInfo(encoded)
>>> ci.effectiveMimeType
'text/plain'
>>> ci.effectiveParameters
{'charset': 'utf-8'}
>>> ci.contentType
'text/plain;charset=utf-8'

The IContentInfo.getCodec() and IContentInfo.decode() methods can be used to handle encoded data using the encoding indicated by the charset parameter. Let’s store some UTF-8 data in a variable:

>>> utf8_data = b"\xAB\xBB".decode("iso-8859-1").encode("utf-8")
>>> utf8_data
b'\xc2\xab\xc2\xbb'

We want to be able to decode the data using the IContentInfo object. Let’s try getting the corresponding ICodec object using IContentInfo.getCodec():

>>> codec = ci.getCodec()
Traceback (most recent call last):
...
ValueError: unsupported charset: 'utf-8'

So, we can’t proceed without some further preparation. What we need is to register an ICharset for UTF-8. The ICharset will need a reference (by name) to a ICodec for UTF-8. So let’s create those objects and register them:

>>> import codecs
>>> from zope.mimetype.i18n import _

>>> @zope.interface.implementer(interfaces.ICodec)
... class Utf8Codec(object):
...
...     name = "utf-8"
...     title = _("UTF-8")
...
...     def __init__(self):
...         ( self.encode,
...           self.decode,
...           self.reader,
...           self.writer
...           ) = codecs.lookup(self.name)

>>> utf8_codec = Utf8Codec()

>>> @zope.interface.implementer(interfaces.ICharset)
... class Utf8Charset(object):
...
...     name = utf8_codec.name
...     encoding = name

>>> utf8_charset = Utf8Charset()

>>> import zope.component

>>> zope.component.provideUtility(
...     utf8_codec, interfaces.ICodec, utf8_codec.name)
>>> zope.component.provideUtility(
...     utf8_charset, interfaces.ICharset, utf8_charset.name)

Now that that’s been initialized, let’s try getting the codec again:

>>> codec = ci.getCodec()
>>> codec.name
'utf-8'

>>> codec.decode(utf8_data)
('\xab\xbb', 4)

We can now check that the decode() method of the IContentInfo will decode the entire data, returning the Unicode representation of the text:

>>> ci.decode(utf8_data)
'\xab\xbb'

Another possibilty, of course, is that you have content that you know is encoded text of some sort, but you don’t actually know what encoding it’s in:

>>> encoded2 = Content("text/plain")
>>> zope.interface.alsoProvides(encoded2, ITextPlain)

>>> ci = contentinfo.ContentInfo(encoded2)
>>> ci.effectiveMimeType
'text/plain'
>>> ci.effectiveParameters
{}
>>> ci.contentType
'text/plain'

>>> ci.getCodec()
Traceback (most recent call last):
...
ValueError: charset not known

It’s also possible that the initial content type information for an object is incorrect for some reason. If the browser provides a content type of “text/plain; charset=utf-8”, the content will be seen as encoded. A user correcting this content type using UI elements can cause the content to be considered un-encoded. At this point, there should no longer be a charset parameter to the content type, and the content info object should reflect this, though the previous encoding information will be retained in case the content type should be changed to an encoded type in the future.

Let’s see how this behavior will be exhibited in this API. We’ll start by creating some encoded content:

>>> content = Content("text/plain", "utf-8")
>>> zope.interface.alsoProvides(content, ITextPlain)

We can see that the encoding information is included in the effective MIME type information provided by the content-info object:

>>> ci = contentinfo.ContentInfo(content)
>>> ci.effectiveMimeType
'text/plain'
>>> ci.effectiveParameters
{'charset': 'utf-8'}

We now change the content type information for the object:

>>> ifaces = zope.interface.directlyProvidedBy(content)
>>> ifaces -= ITextPlain
>>> ifaces += IApplicationOctetStream
>>> zope.interface.directlyProvides(content, *ifaces)
>>> content.mimeType = 'application/octet-stream'

At this point, a content type object would provide different information:

>>> ci = contentinfo.ContentInfo(content)
>>> ci.effectiveMimeType
'application/octet-stream'
>>> ci.effectiveParameters
{}

The underlying content type parameters still contain the original encoding information, however:

>>> content.parameters
{'charset': 'utf-8'}