Syntax of the Content-Type Header Field

During an incident this morning I got a bit curious about how the Content-Type and the Cache-Control parameters in the HTTP-header is allowed to be formatted. We might have found that there’s a space between the colon and the type/subtype which according to the RFC2045 document there should be no space between the ‘:’ and the type. However it does not explicitly say that a space is forbidden, and the document uses a space in some of the examples:

“Note that the value of a quoted string parameter does not include the quotes. That is, the quotation marks in a quoted-string are not a part of the value of the parameter, but are merely used to delimit that parameter value. In addition, comments are allowed in accordance with RFC 822 rules for structured header fields. Thus the following two forms:

Content-type: text/plain; charset=us-ascii (Plain text)

Content-type: text/plain; charset=”us-ascii” are completely equivalent.”

5.1. Syntax of the Content-Type Header Field

In the Augmented BNF notation of RFC 822, a Content-Type header field value is defined as follows:

content := "Content-Type" ":" type "/" subtype                *(";" parameter)                ; Matching of media type and subtype                ; is ALWAYS case-insensitive.
type := discrete-type / composite-type
discrete-type := "text" / "image" / "audio" / "video" /                      "application" / extension-token
composite-type := "message" / "multipart" / extension-token
extension-token := ietf-token / x-token
ietf-token := <An extension token defined by a                    standards-track RFC and registered                    with IANA.>
x-token := <The two characters "X-" or "x-" followed, with                 no intervening white space, by any token>
subtype := extension-token / iana-token
iana-token := <A publicly-defined extension token. Tokens                    of this form must be registered with IANA                    as specified in RFC 2048.>
parameter := attribute "=" value
attribute := token                  ; Matching of attributes                  ; is ALWAYS case-insensitive.
value := token / quoted-string
token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,                 or tspecials>

Note that the definition of “tspecials” is the same as the RFC 822 definition of “specials” with the addition of the three characters “/”, “?”, and “=”, and the removal of “.”.

Note also that a subtype specification is MANDATORY — it may not be omitted from a Content-Type header field. As such, there are no default subtypes.

The type, subtype, and parameter names are not case sensitive. For example, TEXT, Text, and TeXt are all equivalent top-level media types. Parameter values are normally case sensitive, but sometimes are interpreted in a case-insensitive fashion, depending on the intended use. (For example, multipart boundaries are case-sensitive, but the “access-type” parameter for message/External-body is not case-sensitive.)

Note that the value of a quoted string parameter does not include the quotes. That is, the quotation marks in a quoted-string are not a part of the value of the parameter, but are merely used to delimit that parameter value. In addition, comments are allowed in accordance with RFC 822 rules for structured header fields. Thus the following two forms

Content-type: text/plain; charset=us-ascii (Plain text)

Content-type: text/plain; charset=”us-ascii”

are completely equivalent.

Beyond this syntax, the only syntactic constraint on the definition of subtype names is the desire that their uses must not conflict. That is, it would be undesirable to have two different communities using “Content-Type: application/foobar” to mean two different things. The process of defining new media subtypes, then, is not intended to be a mechanism for imposing restrictions, but simply a mechanism for publicizing their definition and usage. There are, therefore, two acceptable mechanisms for defining new media subtypes:

(1)   Private values (starting with "X-") may be defined          bilaterally between two cooperating agents without          outside registration or standardization. Such values          cannot be registered or standardized.
(2)   New standard values should be registered with IANA as          described in RFC 2048.

The second document in this set, RFC 2046, defines the initial set of media types for MIME.

Technorati Tags: , , ,

Leave a comment

Your comment