The Importance of Using Language Attributes

Before we even get to the importance of it, let's first look at what the language attribute is.

Language Attribute
<html lang="fr-Brai-BE"> ... </html>

The language attribute defines the natural language of the web page. Although most often seen with 1 to 2 subtags such as lang="en" for english or lang="en-US" for United States english, the code can get even more specific. The full definition for how to format the subtags can be found in RFC 5646: Tags for Identifying Languages also known as BCP 47. Let's look at the most commonly used subtags.

Note: If portions of the content are not the same language as the page, a language attribute can be added to a content specific tag such as <p lang="en-GB">.

Subtags

Language Subtag

The language subtag is the only required part of the tag. Comprised of 2 to 3 lowercase characters, it represents the basic language from the BCP 47 code list. If the page is in Maori for example, the language code would be mi.

Language Subtag - Maori
<html lang="mi"> ... </html>

Fun fact: Many constructed languages have BCP 47 codes. Klingon's is tlh. A list of language codes for constructed languages can be found on wikipedia.

Script Subtag

This optional tag, when used, comes after the Language Subtag and is always a 4 character code with the first letter capitalized. It defines the writing system used such as ja-Kana for Japanese written with the Katakana alphabet. If a language is written in its typical way, such as French using the Latin alphabet, this subtag is not necessary. However, if the same language french is written using the Braille alphabet, then the alphabet should be specified.

Script Subtag - French using the Braille alphabet
<html lang="fr-Brai>

Region Subtag

Also optional, this tag comes after the primary language (and script subtag if present) and indicates the linguistic variation for the primary language.

Region Subtag - English from the United Kingdom
<html lang="en-GB">...</html>

It can can specify a country, territory, or region. Using the region subtag is helpful when there is region specific spelling, or variations in the language due to dialects or language use.

Using the example from Listing 4, if we consider spelling. There are differences between the spelling of certain words in english depending on the locale such as "color" in American english vs "colour" in British english. Adding the region subtag will indicate which spelling would be considered most correct.

Why it's important

Now that we understand what the language attribute is, we can look at why it is important. Many mechanisms on the web use the language attribute to better the accessibility and user experience.

Translation

Translation tools will use the language attribute to detect the current language. This can give options to translate the text provided based on user settings. If you have ever used Lorem Ipsum in a web page to mock content until the final copy was ready, you may have noticed that the page will propose the translate the page from latin to your current language.

Lorem Ipsum

Google translate attempting to translate Lorem Ipsum

Because Lorem Ipsum actually has roots in Latin, we can actually try to translate it, the problem is it turns out to be gibberish.

The first paragraph:

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Morbi vel odio sapien. Pellentesque dignissim porta mattis. Etiam sed sem leo. Aliquam non ex ex. Vivamus dui nibh, vehicula at tincidunt eget, facilisis vitae tortor. Vivamus lacinia ut ex ut consequat. Suspendisse aliquam hendrerit sagittis.

translates to English as:

The pain itself is love, the main storage system. The disease or hatred of the wise. Pellentesque dignissim porta mattis Yes, but a lot of timing. Maybe not from ex. We live in the housing, the vehicles at the keyboard require, the easy life of the torturer. We should live with the moms. Maybe some strategic arrows.

Not exactly helpful. Although a pretty pointed corner case, it illustrates the importance of proper tagging to prevent erroneous information from being presented to the user.

The language code for non-linguistic content is zxx. Therefore, in our example above, we want want to have our lorem ipsum content blocks include a lang="zxx" attribute.

Spelling and grammar checker

Including proper language attributes will help spelling and grammar checker better guide users. Including region codes can be especially helpful to help spell check accuracy.

Non-text readers

Readers such as speech synthesizers and Braille translators rely on language attributes to produce usable results. The video at the bottom of this article shows what happens when the language attribute is set to French when the content is actually in English. The reader does not appropriately select the languages mode and therefore speaks the text in English with a French accent. Although this can be humorous in the moment, this can create some serious accessibility issues for disabled users who rely on assistive technologies to access our content.

Parsers and scripts

Adding language attributes to content can allow processing of the content based on the language. A notable example is using CSS selectors to change the style of the content based on the language.

In the examples set in Listings 5 and 6, we use the language attribute information to only select the paragraphs that are written in French to give them a different style than the paragraphs written in English. Figure 2 shows the output.

Language tag HTML
<p lang="en">This sentence is in English.</p>
<p lang="fr">Cette phrase est en français.</p>
<p lang="en">And now in English again.
Language tag CSS
p[lang="fr"] {
  font-weight: bold;
  color: slategrey;
  font-style: italic;
}

Lorem Ipsum

Styles applied based on language attribute

SEO

Search engines not only use the language information for its intended purpose of identifying the language of the content but also to improve search results. So including a language attribute can help us get our content in front of our intended audience.

Closing thoughts and resources

The language attribute, although seemingly quite simplistic on the surface, and in most cases an easy add to a website, will not only improve our applications' User Experience but also our SEO. It truly does benefit everyone.

Resources

Happy Coding!

Accessibility

We think out of the box to make products and websites available to all people, everywhere.

Read more about Accessibility with Andromeda

References

  • “Codes for constructed languages.” Wikipedia, https://en.wikipedia.org/wiki/Codes_for_constructed_languages. Accessed 19 December 2021.
  • Ishida, Richard. “Declaring language in HTML.” World Wide Web Consortium (W3C), https://www.w3.org/International/questions/qa-html-language-declarations. Accessed 19 December 2021.
  • Ishida, Richard. “Language on the Web.” World Wide Web Consortium (W3C), https://www.w3.org/International/getting-started/language. Accessed 19 December 2021.
  • Ishida, Richard. “Tagging text with no language.” W3C, https://www.w3.org/International/questions/qa-no-language. Accessed 19 December 2021.
  • Ishida, Richard, and Deborah Cawkwell. “Why use the language attribute?” W3C, https://www.w3.org/International/questions/qa-lang-why. Accessed 19 December 2021.
  • “lang - HTML: HyperText Markup Language | MDN.” MDN Web Docs, 23 October 2021, https://developer.mozilla.org/en-US/docs/Web/HTML/Global_attributes/lang. Accessed 19 December 2021.
  • “Language subtag lookup app.” r12a, https://r12a.github.io/app-subtags/. Accessed 19 December 2021.
  • “rfc5646.” IETF Tools, https://datatracker.ietf.org/doc/html/rfc5646. Accessed 19 December 2021.