Text to Binary Converter

Programming code displayed on a laptop screen showing binary and text conversion concepts

Photo by Safar Safarov on Unsplash

TB

Thibault Besson-Magdelain

Software engineer building privacy-first web tools. All processing happens in your browser.

Connect on LinkedIn

7 min read | Last updated: February 10, 2026 | Algorithm: Unicode/ASCII character encoding

Key Takeaways

  • Each character in a text string is converted to its binary equivalent using a character encoding standard (ASCII, UTF-8, or UTF-16).
  • ASCII maps 128 characters to 7-bit values (stored as 8-bit bytes), while UTF-8 uses 1-4 bytes to support over 149,000 Unicode characters including emoji.
  • Text-to-binary conversion is the fundamental process by which computers store, transmit, and process all written communication.

How Text to Binary Conversion Works

Converting text to binary is the process of translating human-readable characters into the language that computers understand natively: sequences of ones and zeros. Every character you type on a keyboard -- whether it is a letter, number, punctuation mark, emoji, or special symbol -- has a corresponding numerical value defined by a character encoding standard. The most fundamental of these standards is ASCII, which assigns a unique number between 0 and 127 to each character in the basic Latin alphabet.

Once a character has been mapped to its numerical value, that number is expressed in binary (base-2) notation. For example, the uppercase letter "A" has an ASCII value of 65. To convert 65 to binary, you repeatedly divide by 2 and record the remainders: 65 / 2 = 32 R1, 32 / 2 = 16 R0, 16 / 2 = 8 R0, 8 / 2 = 4 R0, 4 / 2 = 2 R0, 2 / 2 = 1 R0, 1 / 2 = 0 R1. Reading the remainders from bottom to top gives 1000001, which is padded to 8 bits as 01000001.

This process is fundamental to how every computer operates. When you type a document, send an email, or post a message online, your device converts every character into binary data before storing or transmitting it. The text-to-binary converter above automates this process, supporting three encoding standards and three output formats. As part of the broader family of binary converters, this tool bridges the gap between human-readable text and machine-level binary representation.

Encoding Standards Compared

The encoding you choose determines how characters map to binary values. ASCII is the simplest: 128 characters, each represented by exactly 8 bits. It covers English letters, digits, basic punctuation, and control characters. UTF-8 is the modern standard used by over 98% of websites. It is backward-compatible with ASCII (the first 128 characters are identical) but extends to support every Unicode character using 1 to 4 bytes per character. UTF-16 uses 2 bytes for most characters and 4 bytes (surrogate pairs) for characters outside the Basic Multilingual Plane, including most emoji.

Understanding ASCII Encoding

The American Standard Code for Information Interchange (ASCII) was developed in the early 1960s and published as a formal standard in 1963 by the American National Standards Institute (ANSI). It was designed to standardize character encoding across different computer manufacturers, solving the compatibility problems that arose when each vendor used proprietary character sets.

ASCII defines 128 characters, each assigned a number from 0 to 127. The encoding divides naturally into four groups:

  • Control characters (0-31, 127): Non-printable characters used for text formatting and device control. These include carriage return (13), line feed (10), tab (9), escape (27), and null (0). While rarely visible to users, they play critical roles in text processing and terminal communication.
  • Space and punctuation (32-47, 58-64, 91-96, 123-126): The space character (32), mathematical operators, brackets, and common symbols like @, #, $, and &.
  • Digits (48-57): The numerals 0 through 9. Note that the character "0" has ASCII code 48, not 0 -- a distinction that matters when converting between character and numeric representations.
  • Letters (65-90, 97-122): Uppercase A through Z occupy codes 65-90, and lowercase a through z occupy codes 97-122. The difference between corresponding upper and lowercase letters is always exactly 32 (one bit flip), an elegant design choice that simplifies case conversion in software.

Since the original ASCII standard uses 7 bits, it can represent values from 0 to 127. In practice, ASCII characters are stored as 8-bit bytes with the most significant bit set to zero. This 8-bit convention, adopted universally by modern systems, is the format our converter uses. For a comprehensive reference, consult the Wikipedia ASCII article.

Unicode and UTF-8

While ASCII serves English text adequately, it cannot represent the characters used by the vast majority of the world's writing systems. Chinese, Arabic, Hindi, Cyrillic, Japanese, Korean, Thai, and hundreds of other scripts are completely absent from ASCII. Unicode was created to solve this problem by assigning a unique code point to every character in every major writing system, as well as mathematical symbols, technical symbols, historical scripts, and emoji.

As of Unicode 15.1, the standard defines over 149,000 characters across 161 scripts. Each character is identified by a code point written as U+XXXX (e.g., U+0041 for "A", U+4E16 for the Chinese character meaning "world", U+1F600 for the grinning face emoji). The Unicode code space ranges from U+0000 to U+10FFFF, encompassing over 1.1 million possible code points.

UTF-8 (Unicode Transformation Format - 8-bit) is the dominant encoding for Unicode text. Created by Ken Thompson and Rob Pike in 1992, it uses a variable-width scheme that is backward-compatible with ASCII:

  • 1 byte (U+0000 to U+007F): All ASCII characters use a single byte, identical to their ASCII encoding. This means any valid ASCII text is automatically valid UTF-8.
  • 2 bytes (U+0080 to U+07FF): Latin accented characters, Greek, Cyrillic, Arabic, and Hebrew scripts use two bytes.
  • 3 bytes (U+0800 to U+FFFF): CJK (Chinese, Japanese, Korean) characters, most other scripts, and common symbols use three bytes.
  • 4 bytes (U+10000 to U+10FFFF): Emoji, mathematical symbols, historic scripts, and rare CJK characters use four bytes. These are the code points that require surrogate pairs in UTF-16.

This variable-width design makes UTF-8 space-efficient for text that is primarily Latin-based while still supporting the full Unicode range. It is the encoding used by 98.2% of websites (per W3Techs) and is the default encoding for HTML5, JSON, and most modern programming languages.

Step-by-Step Conversion Tutorial

Let us walk through converting the word "Hello" to binary using ASCII encoding, step by step.

Step 1: Identify Each Character

Break the text into individual characters: H, e, l, l, o. We have five characters to convert.

Step 2: Look Up the ASCII Value

Find each character's ASCII code: H = 72, e = 101, l = 108, l = 108, o = 111.

Step 3: Convert Each Value to Binary

Convert each decimal ASCII value to an 8-bit binary number using repeated division by 2:

  • H (72): 72 → 36 R0, 36 → 18 R0, 18 → 9 R0, 9 → 4 R1, 4 → 2 R0, 2 → 1 R0, 1 → 0 R1. Binary: 01001000
  • e (101): 101 → 50 R1, 50 → 25 R0, 25 → 12 R1, 12 → 6 R0, 6 → 3 R0, 3 → 1 R1, 1 → 0 R1. Binary: 01100101
  • l (108): Binary: 01101100
  • l (108): Binary: 01101100
  • o (111): Binary: 01101111

Step 4: Combine the Result

The complete binary representation of "Hello" is:

01001000 01100101 01101100 01101100 01101111

To decode this binary back to text, split it into 8-bit groups, convert each group to decimal, and look up the corresponding ASCII character. The process is perfectly reversible -- no information is lost.

Use Cases in Software Development

Text-to-binary conversion is not just an academic exercise. It has concrete, everyday applications in software engineering and computer science.

  • Network Protocols: HTTP, WebSocket, SMTP, and FTP protocols transmit text as binary streams. Understanding the binary representation of text helps developers debug network issues, analyze packet captures with tools like Wireshark, and implement custom protocols.
  • Cryptography: Encryption algorithms (AES, RSA, ChaCha20) operate on binary data. The first step in encrypting a text message is converting it to binary. Key derivation functions, initialization vectors, and cipher blocks are all binary constructs.
  • File Formats: Text files, JSON, XML, CSV, and configuration files store characters as binary-encoded bytes. Understanding the encoding (UTF-8, UTF-16LE, UTF-16BE) is critical when reading files across platforms and languages.
  • IoT and Embedded Systems: Microcontrollers with limited memory often process text as raw binary data. Firmware developers manipulate individual bits and bytes when implementing serial communication, display drivers, and sensor protocols.
  • Data Compression: Algorithms like Huffman coding and arithmetic coding work by creating optimized binary representations of text. Understanding how characters map to binary is essential for implementing and debugging compression algorithms.

Programming Examples

JavaScript

In JavaScript, charCodeAt() returns the UTF-16 code unit of a character, and toString(2) converts a number to a binary string. For full Unicode support, use codePointAt() to handle emoji and characters outside the Basic Multilingual Plane. For more details, consult the MDN charCodeAt documentation.

// JavaScript: Text to binary
function textToBinary(text) {
  return [...text].map(char => {
    return char.codePointAt(0)
      .toString(2)
      .padStart(8, '0');
  }).join(' ');
}
console.log(textToBinary('Hello'));
// Output: 01001000 01100101 01101100 01101100 01101111

Python

In Python, the ord() function returns the Unicode code point of a character, and bin() converts an integer to its binary string representation (prefixed with 0b).

# Python: Text to binary
def text_to_binary(text):
    return ' '.join(format(ord(c), '08b') for c in text)

print(text_to_binary('Hello'))
# Output: 01001000 01100101 01101100 01101100 01101111

Frequently Asked Questions

How do I convert text to binary?

Each character in your text has a numeric code (e.g., ASCII value). Convert that number to base-2 (binary) and pad to 8 bits. For example, "A" = 65 = 01000001 in binary. Our tool does this automatically for every character, supporting ASCII, UTF-8, and UTF-16 encodings.

What is the difference between ASCII and UTF-8 encoding?

ASCII uses 7 bits to represent 128 characters (English letters, digits, common symbols). UTF-8 is backward-compatible with ASCII but uses 1 to 4 bytes per character, supporting over 149,000 characters including international scripts, mathematical symbols, and emoji. UTF-8 is the standard encoding for web content.

Can I convert binary back to text?

Yes. Click the "Swap Direction" button in our tool to switch to binary-to-text mode. Enter space-separated, comma-separated, or continuous binary digits and the tool will decode them back to readable text using your selected encoding.

How do emoji get converted to binary?

Emoji are Unicode characters with code points above U+FFFF. In UTF-8 encoding, they require 4 bytes (32 bits of binary data). In UTF-16, they are represented using surrogate pairs -- two 16-bit code units. Our tool correctly handles both encodings, including complex emoji sequences.

Why is binary important in computing?

Computers use transistors that have exactly two states (on and off), which correspond directly to binary digits (1 and 0). All data that a computer processes -- text, images, audio, video, and program instructions -- is ultimately stored and manipulated as binary. Understanding binary is essential for any software developer working with low-level systems, data encoding, or computer architecture.

What is the binary code for the letter A?

The uppercase letter "A" has ASCII code 65. In 8-bit binary, this is 01000001. The lowercase letter "a" has ASCII code 97, which is 01100001 in binary. The difference is exactly 32 (binary 00100000), which means toggling the 6th bit converts between upper and lowercase.

How many bits does each character use?

In ASCII encoding, every character uses exactly 8 bits (1 byte). In UTF-8, characters use a variable number of bytes: basic Latin characters use 1 byte (8 bits), accented characters and common scripts use 2 bytes (16 bits), CJK characters use 3 bytes (24 bits), and emoji use 4 bytes (32 bits). The exact bit count depends on the character's Unicode code point.