XHTML Explained
Argh! Just when we were all getting comfortable with the HTML 4.01 stuff they go and change the standards again. Now we have to learn all-new tags and stress even more about browser compatibility... Except not really. This isn't a big shake-up like HTML 4.01 was. See what it's all about below.
This page was last updated on 2012-08-21
Why the Change?
Right, before we get into all this, you should have a good grasp of the ancient and more recent past of HTML. You can get the full low-down in The History of HTML, but I'll summarise:
HTML began as a simple way to transfer data between any computer across the Internet; designed for scientists and researchers with no publishing experience. Over time the web became mainstream entertainment and new tags were brought in by the browser companies that didn't go along with this original aim — presentation became hugely important and structure and compatibility started to take a back seat. This meant that some pages were not accessible for people with the 'wrong' browser or computer setup.
Thankfully, the use of much of the extraneous presentational tags has receded in use in recent times, mainly due to the innovation of CSS code. Ideal HTML would be purely structural, with every element concerning how a page is displayed being controlled by a stylesheet. The » W3C (HTML's overseers, whom you should know something about by now) have spearheaded this desire with XHTML.
Further to all that, in recent times the Internet has begun to be accessed through new devices other than the classic computer and web browser arrangement. Things like PDAs, phones and, er, fridges with Internet access are going to become common in the near future. There's an estimate going around that sometime in the near future, 75% of Internet viewing will be carried out on one of these many new platforms. The custom-made browsers used in these systems need to be small for cost-effectiveness. For every markup error that a browser has to deal with, more code has to be added to the program. XHTML is a very, very strict way of coding, which means the system makers don't have to accommodate for bad markup.
What is XHTML?
Before I describe XHTML, it is probably best to understand where it has come from. All web Markup languages are based on SGML, a horrendously complicated language that is not designed for humans to write. SGML is what is called a metalanguage; that is, a language that is used to define other languages. To make its power available to web developers, SGML was used to create XML (eXtensible Markup Language), a simplified version, and also a metalanguage.
XML is a powerful format — you create your own tags and attributes to suit the type of document you're writing. By using a set group of tags and attributes and following the rules of XML, you've created a new Markup language.
This is what has been done to create XHTML (eXtensible HyperText Markup Language) — which is why you'll see XHTML being called a subset or application of XML. The pre-existing HTML 4.01 tags and attributes were used as the vocabulary of this new Markup language, with XML providing the rules of how they are put together.
So, using XHTML, you are really writing XML code, but restricting yourself to a predetermined set of elements. This gives you all the benefits of XML (see below), while avoiding the complications of true XML; bridging the gap for developers who might not fancy taking on something as tricky as full-on XML. As you're coding under the guise of XHTML, all of the tags available to you should be familiar. Writing XHTML requires that you follow the rules of conformant XML, such as correct syntax and structure. As XHTML looks so much like classic HTML, it faces no compatibility problems as long as some simple coding guidelines are followed.
If all of this sounds a bit heavy, don't worry. Transitioning to XHTML is quite a simple process, with only a few rules to remember.
Benefits of XHTML
The benefits of adopting XHTML now or migrating your existing site to the new standards are many. First of all, they ensure excellent forward-compatibility for your creations. XHTML is the new set of standards that the web will be built on in the years to come, so future-proofing your work early will save you much trouble later on. Future browser versions might stop supporting deprecated elements from old HTML drafts, and so many old basic-HTML sites may start displaying incorrectly and unpredictably.
Once you have used XHTML for a short time, it is no more difficult to use than HTML ever was, and in ways is easier since it is built on a more simplified set of standards. Writing code is a more streamlined experience, as gone are the days of browser hacks and display tricks. Editing your existing code is also a nicer experience as it is infinitely cleaner and more self-explanatory. Browsers can also interpret and display a clean XHTML page quicker than one with errors that the browser may have to handle.
A well-written XHTML page is more accessible than an old style HTML page, and is guaranteed to work in any standards-compliant browser (which the latest round have finally become) due to the insistence on rules and sticking to accepted W3C specifications. As mentioned above, XHTML allows greater access to configurations other than a computer and browser. This interoperability is another aspect of XHTML's greater accessibility.
XHTML Coding
The first thing you need to know about changing over to XHTML as the new standard is that there really isn't much new to learn. No new tags or attributes have been added into your repertoire, like HTML 4 (although a few have been deprecated); this is just a move towards good, valid and efficient coding. XHTML documents stress logical structure and simplicity, and use CSS for nearly all presentational concerns. It just means you have to change the way you write code. Even if you always wrote great code before, there're a few new practices you need to add in.
What's even more quality about it though, is that a page written entirely in XHTML will still work fine in the current generation of browsers, so you shouldn't have any problems migrating your site across.
XML Declaration
An XML declaration at the very top of your document defines both the version of XML you're using as well as the character encoding. It is recommended but not required; as a few » old browsers will choke on a page that begins this way. For this reason, I advise against using the correct line:
<?xml version="1.0" encoding="UTF-8"?>
and instead using a meta tag in the head
of your document. If you're using » Unicode,
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
And if you're using the more common ISO-8859-1 encoding, use
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
XHTML DTDs
Whether you use the XML declaration or not, every XHTML document must be defined as such by a line of code at the start of the page, and some attributes in the main <html>
tag, which tell the browser what language the text is in. The opening line is the DTD (Document Type Declaration). This tells your browser and validators the nature of your page.
A DTD is the file your browser reads with the names and attributes of all of the possible tags that you can use in your markup defined in it. Newer browsers will usually have the latest specs written into their DTDs. The official » XHTML Strict DTD is available for you to attempt to read. Declare it by putting this at the very top of your code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
That DTD is the one you use if you're committed to writing entirely correct XHTML code. Strict XHTML dispenses with a whole lot of presentational tags and attributes, and is indeed very strict.
If you choose to use it, you're going to have to become close friends with the » W3C validator. You won't be permitted to use the font
tag at all, nor will attributes like width
and height
be allowed in your tables. You won't be able to use the border
attribute on images, and will have to use the alt
attribute on all images if you want to validate. You get the idea — almost all presentational attributes are restricted in favour of wider CSS utilisation, so unless you know your stuff in this regard, it'd be best to use XHTML Transitional below.
If you're going to hover between HTML and XHTML use the next DTD, which is a bit looser, and if you're putting together a frameset page, use the last one.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
Most people will opt for XHTML Transitional, as changing straight to Strict can be a daunting prospect. If you feel you're able to work within Strict's constraints, by all means go for it.
A correct DTD allows the browser to go into standards mode, which will render your page correctly, and similarly across browsers. Without a full DTD your browser enters ‘compatibility’, or ‘quirks’ mode, behaving like a version 4 browser, including all of their associated quirks and inconsistencies. Also, these declarations are all case-sensitive, so don't change them in any way.
Finally, you need to define the XML Namespace your document uses. Don't stress about this — it's simply a definition of which set of tags you're going to be using, and concerns the » modular properties of XHTML. It's set by adding an attribute into the <html>
tag. While we're at it, we specify the language of our pages too. Modify your tags to this:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
</html>
XHTML Coding Practices
And now the moment you've been waiting for — the different styles of coding used by an XHTML author compared to the old HTML methods. You shouldn't have many problems adopting these new techniques, so long as you work carefully. It should be noted, even if it is an obvious point, that you really must hand code to be able to write valid XHTML. No current visual editor comes close to the compliance required.
sourcetip: Even though your code is changing, your filenames won't have to — you end your files with .html as always.
1. Tags and attributes have to be lowercase
Whereas before it used to come down to preference whether you used <B>
or <b>
, now all of your tags and attributes have to be in lowercase. This is because XML is case-sensitive — i.e. a tag in capitals is a different tag to one in small letters.
2. All tags must be closed
Now all of those once-optional </p>
and </li>
tags are essential for your XHTML documents to validate. Even empty elements like img
, hr
, and br
must now be closed. You can use a standard forward-slashed end tag, or just add in a forward slash to the end of the tag.
<br />
or <br></br>
It's recommended that you use the former method here, and leave a space before the slash so older browsers aren't confused. They'll just ignore the trailing slash as an unrecognised attribute.
3. Documents must be well-formed
'Well-formedness' is a dream that you were meant to try and make real from the start, but many coders write badly-syntaxed code. You have to open and close tags correctly in XHTML, and nest them properly.
Bad: <p>My coding is <b>bad</p></b>
Good: <p>But my coding is <b>good</b></p>
Remember the simple rule you should have been taught at the very start: The first tag you open is the last tag you close.
4. Attribute values must be quoted
Back in HTML you could leave out the quotes on a number value, like HEIGHT=3
, but now all values have to have quotation marks around them, so that would become height="3"
.
5. Attribute Minimisation
Some HTML tags had one-word attributes, like HR
's NOSHADE
. You can't use these anymore, and must add the attribute in as its own value, like so:
<hr noshade="noshade" />
Any browser compatible with HTML 4.01 shouldn't have a problem with markup like this.
6. Internal Links
Internal links in HTML were made using a combination of the <a>
tag and the name
attribute. In XHTML, to go along with XML, you use the id
attribute to make these links instead of the name
attribute. For a while you should probably include both so that your links still work on older browsers, but this will be the method used in future. The name
attribute has been deprecated.
<a href="#section">link</a>
<p id="section" name="section"></p>
Since all tags can take the id
attribute, you can now make links to any element on your page. Most helpful if you add the link to a heading or specific paragraph.
7. Alternative text in images
While it has always been good practice to add the alt="..."
attribute to your images, now you must add some alternate text to every image on your page. If your image is purely decorative you can give it a null alt
attribute with a space:
<img src="header.gif" alt=" " />
You could also try adding the title="..."
attribute to as many elements as possible. It's a good accessibility aid, especially on links.
8. Ampersands in URLs
Ampersand characters are frequently used in page addresses to carry variables, like in PHP. When coding these addresses into your XHTML, you must escape them using the entity value &
. They'll be displayed as ampersand characters (&) on screen, of course.
<a href="reviews.php?page=27&style=blue">link</a>
becomes
<a href="reviews.php?page=27&style=blue">link</a>
9. Content must be wrapped in a block-level element
In XHTML Strict, when you add text to your page, you can’t add it directly into the body
element. All text needs to be within a suitable containing block-level element, such as a p
, a ul
or a div
.
As you should always have done before, be sure to validate your document to certify that there are no errors. There is absolutely no point in writing XHTML if you don't make sure it is free of mistakes. The online » W3C validator will check your code for mistakes and give you a full report back. Once you can » understand its occasionally unhelpful error messages, it is an excellent utility. Make use of it.