Character Encoding

Character encoding is a system that pairs each character in a character set with a specific numeric value (code pint) that can be used to represent the character in digital form. THis allows text to be stored, transmitted, and rendered consistently.

Common Character Encodings:

ASCII: A code unit in ASCII consists of 7 bits; It represents 128 characters, including English letters, digits, and control characters. LImited to basic English text.
UTF-8: A code unit in UTF-8, EBCDIC and GB 18030 consists of 8 bits; It is backward-compatible with ASCII and is the most widely used encoding on the web.

The most popular character encoding on the World Wide Web is UTF-8, which is used in 98.2% of surveyed web sites, as of May 2024. src

Specifying Character Encoding in HTML To ensure that your web pages are displayed correctly, you should specify the character encoding in your HTML documents.

<html>
    <head>
        <meta charset="UTF-8">
    </head>
</html>

Character Encoding in HTTP Headers The character encoding can also be specified in the HTTP headers sent by the server. This is done using the Content-Type header:

Content-Type: text/html; charset=UTF-8

Handling character Encoding in .NET In .NET, you can specify the character encoding when reading or writing text files, handling HTTP responses, working with streams. For example, when writing a response in an ASP.NET Core application:

public IActionResult GetExample()
{
    var content = "Hello world!";
    return Content(content, "text/plain", Encoding.UTF8)
}

Common Issues:

Mojibake: Garbled text that appears when text is decoded using the wrong character encoding.

Character Encoding

Screen Dimensions