Before we even get to the importance of it, let's first look at what the language attribute is.
<html lang="fr-Brai-BE"> ... </html>
The language attribute defines the natural language of the web page. Although most often seen with 1 to 2 subtags such as lang="en"
for english or lang="en-US"
for United States english, the code can get even more specific. The full definition for how to format the subtags can be found in RFC 5646: Tags for Identifying Languages also known as BCP 47. Let's look at the most commonly used subtags.
Note: If portions of the content are not the same language as the page, a language attribute can be added to a content specific tag such as <p lang="en-GB">
.
Subtags
Language Subtag
The language subtag is the only required part of the tag. Comprised of 2 to 3 lowercase characters, it represents the basic language from the BCP 47 code list. If the page is in Maori for example, the language code would be mi
.
<html lang="mi"> ... </html>
Fun fact: Many constructed languages have BCP 47 codes. Klingon's is tlh
. A list of language codes for constructed languages can be found on wikipedia.
Script Subtag
This optional tag, when used, comes after the Language Subtag and is always a 4 character code with the first letter capitalized. It defines the writing system used such as ja-Kana
for Japanese written with the Katakana alphabet. If a language is written in its typical way, such as French using the Latin alphabet, this subtag is not necessary. However, if the same language french is written using the Braille alphabet, then the alphabet should be specified.
<html lang="fr-Brai>
Region Subtag
Also optional, this tag comes after the primary language (and script subtag if present) and indicates the linguistic variation for the primary language.
<html lang="en-GB">...</html>
It can can specify a country, territory, or region. Using the region subtag is helpful when there is region specific spelling, or variations in the language due to dialects or language use.
Using the example from Listing 4, if we consider spelling. There are differences between the spelling of certain words in english depending on the locale such as "color" in American english vs "colour" in British english. Adding the region subtag will indicate which spelling would be considered most correct.
Why it's important
Now that we understand what the language attribute is, we can look at why it is important. Many mechanisms on the web use the language attribute to better the accessibility and user experience.
Translation
Translation tools will use the language attribute to detect the current language. This can give options to translate the text provided based on user settings. If you have ever used Lorem Ipsum in a web page to mock content until the final copy was ready, you may have noticed that the page will propose the translate the page from latin to your current language.
Because Lorem Ipsum actually has roots in Latin, we can actually try to translate it, the problem is it turns out to be gibberish.
The first paragraph:
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi vel odio sapien. Pellentesque dignissim porta mattis. Etiam sed sem leo. Aliquam non ex ex. Vivamus dui nibh, vehicula at tincidunt eget, facilisis vitae tortor. Vivamus lacinia ut ex ut consequat. Suspendisse aliquam hendrerit sagittis.
translates to English as:
The pain itself is love, the main storage system. The disease or hatred of the wise. Pellentesque dignissim porta mattis Yes, but a lot of timing. Maybe not from ex. We live in the housing, the vehicles at the keyboard require, the easy life of the torturer. We should live with the moms. Maybe some strategic arrows.
Not exactly helpful. Although a pretty pointed corner case, it illustrates the importance of proper tagging to prevent erroneous information from being presented to the user.
The language code for non-linguistic content is zxx
. Therefore, in our example above, we want want to have our lorem ipsum content blocks include a lang="zxx"
attribute.
Spelling and grammar checker
Including proper language attributes will help spelling and grammar checker better guide users. Including region codes can be especially helpful to help spell check accuracy.
Non-text readers
Readers such as speech synthesizers and Braille translators rely on language attributes to produce usable results. The video shows what happens when the language attribute is set to French when the content is actually in English. The reader does not appropriately select the languages mode and therefore speaks the text in English with a French accent. Although this can be humorous in the moment, this can create some serious accessibility issues for disabled users who rely on assistive technologies to access our content.
Parsers and scripts
Adding language attributes to content can allow processing of the content based on the language. A notable example is using CSS selectors to change the style of the content based on the language.
In the examples set in Listings 5 and 6, we use the language attribute information to only select the paragraphs that are written in French to give them a different style than the paragraphs written in English. Figure 2 shows the output.
<p lang="en">This sentence is in English.</p>
<p lang="fr">Cette phrase est en français.</p>
<p lang="en">And now in English again.
p[lang="fr"] {
font-weight: bold;
color: slategrey;
font-style: italic;
}
SEO
Search engines not only use the language information for its intended purpose of identifying the language of the content but also to improve search results. So including a language attribute can help us get our content in front of our intended audience.
Closing thoughts and resources
The language attribute, although seemingly quite simplistic on the surface, and in most cases an easy add to a website, will not only improve our applications' User Experience but also our SEO. It truly does benefit everyone.
Resources
- BCP47 Language Subtag Lookup: A tool that lists our subtag values. Also contains a tag validator to validate that the our subtags value and syntax are correct.
- RFC 5646: Tags for Identifying Languages also known as BCP 47
- BCP 47 country code list
Happy Coding!