Is my HTML valid?

Validated HTML (paper) document

I like my HTML clean. I like it valid, and surprisingly, a lot of sites don't really adhere to the specifications given to us. You technically don't need to - browsers are very forgiving and will do their best to render whatever monstrosity you feed them. This is partly for backwards compatibility, and partly for usability - it's not the browser's fault nor the user's fault that a page is not following the rules, but the browser wants to give the user the best browsing experience either way (or the user might switch browser - looking at you, Internet Explorer).

The smallest valid HTML document

Let's start at the basics. If a document has no content whatsoever, what does it look like if we still want it to be valid? Well, we need to start with the doctype declaration <!DOCTYPE html>. Then, surprisingly, we don't need <html>, <head>, <body>, or their respective closing tags. They will exist in the parsed document, but we don't have to write them. I would still suggest you do, for readability purposes - but if you, for example, want to minify your HTML documents to extreme lengths, you can omit these in many cases.

The document always needs a title however; so, we need <title>I'm a title!</title>. This title needs to be non-empty (for you curious people out there - whitespace doesn't count as content, so you'll actually need characters for this to be valid). This means the smallest valid HTML document is

<!DOCTYPE html>
<title>I'm a title!</title>

Well, that's smallest node-wise, not character-wise. If we want the least amount of characters, we can omit the newline and make the title a single character, but this is not the point.

Omitting tags

As mentioned previously, the specifications don't really need us to write html, head or body tags. If you don't, it will use a specific set of rules to determine where the html starts (usually just at the beginning, but comments may go before the opening html tag), where the head starts, where it ends, etcetera. The specifications on this can be found here, if you're interested.

Apart from these tags, there are some other tags you can omit if you really want to - for example, li tags don't need to be closed if either the next element is another li, or it is the last element of its parent. This would allow for lists like

<ul>
	<li>Coffee
	<li>Milk
	<li>Tea
	<li>Water
</ul>

Yikes. That doesn't look very good to me personally, but you could do this, theoretically. I suggest you stick with writing HTML elements as they would be parsed, because that makes it a lot easier to read and maintain the code. There's really no reason to omit any of these tags apart from the previously mentioned minifying to the extreme.

Attribute values

More stuff we can omit! Actually, quotes around attribute values aren't (always) necessary. Whenever the attribute value doesn't include a space, equals-sign, a single or double quote, a backtick, < or >, we can just leave the quotes out. For example, the following would be perfectly fine:

<a id=my-favorite-link href=https://example.com/>Click me</a>

Yeah... that's also ugly. It is better to be consistent across your document and just use quotes everywhere, rather than only using quotes where needed - plus, using quotes improves readability. What quotes you use is really up to you - both single quotes and double quotes are valid.

Capitalization

HTML, like CSS, is mostly case-insensitive. You can write your tagnames and attribute names in all-caps, lowercase, or any mix of the two. So, the following would be valid:

<UL>
	<li>This is fine</li>
	<LI>Acceptable but not preferred</LI>
	<Li>Why would you</Li>
	<lI>Oh god you've gone mad</lI>
</ul>

Custom elements/attributes

It all becomes a little more vague when we mix in custom elements or attributes. Let's do a little quiz: which of the following do you think are valid?

<div purpose="divide content"></div>
<button data-send="all">submit</button>
<p long-text="yes">[...]</p>
<cookiejar>Full!</cookiejar>
<to-space launch-time="10s"></to-space>
<i-am-void />
<hay-stack>needle</hay-stack>

Let's find out!

Custom elements

Custom elements have to start with a character (a-z) and contain a dash. That's it! So tag names like hay-stack, o-.-_-.-o or to-space. Check out the HTML specs if you're interested. Essentiallly, if you're a relatively normal person and only use regular characters a-z and dashes, anything is allowed as long as you follow the rules above. It's very flexible, even allows emojis in the name! There are however a few exceptions of element names including a dash that you're not allowed to use, like missing-glyph and a few others (the full list can be found following the link above, there's only 8 of them).

Custom attributes

Custom attributes are also very simple to understand. They must start with data-, that's about it. So including a dash, like with custom element names, is not enough- you need the data- prefix. A lot of sites don't really conform to this (angular sites using the ng- prefix for their attributes, for example) - and sure, browsers won't break necessarily if you use a custom attribute that doesn't start with data-, but it's not valid according to the specifications.

Custom attributes on custom elements

What if we want to use a custom element? What about the attributes then? Do we still need the data- prefix, or is it a little more flexible? Well, in this case, you can use any attribute you want! You don't even need a dash! That is, of course, all attributes except the global attributes that exist on all HTML elements (such as tabindex, id, style, etcetera) and a few others (such as form, name, etcetera). The specs talk about it here

Notes

I'll give you the answers of the little quiz we did now; here are the valid ones:

<button data-send="all">submit</button>
<to-space launch-time="10s"></to-space>
<hay-stack>needle</hay-stack>

The <i-am-void> element is invalid, because, well, custom elements can't be self closing. There's only a few select set of elements that are void elements, and custom elements are not part of that.

That was it!

In big lines, you now know generally how to keep your documents valid. Use the specifications to determine whether your elements are defined in HTML, and what attributes are valid, if you're not sure. You also now know what to do when you want a custom attribute, or element, or even both at the same time.

Keep in mind, having documents conform to the specifications is a choice. Browsers are to documents like Cookie Monster is to cookies - it will eat anything you feed it. If you're a purist, like me, conform all you want! It'll help with document validators and the like, and makes sure your sites are 100% future-proof, but ultimately, most things won't break. Even the big companies and frameworks don't always listen to the specs.