Tokens

Tokens are primitive productions in the grammar defined by regular (non-recursive) languages. Rust source input can be broken down into the following kinds of tokens:

Within this documentation's grammar, "simple" tokens are given in string table production form, and appear in monospace font.

Literals

Literals are tokens used in literal expressions.

Examples

Characters and strings

Example# setsCharactersEscapes
Character'H'0All UnicodeQuote & ASCII & Unicode
String"hello"0All UnicodeQuote & ASCII & Unicode
Raw stringr#"hello"#0 or more*All UnicodeN/A
Byteb'H'0All ASCIIQuote & Byte
Byte stringb"hello"0All ASCIIQuote & Byte
Raw byte stringbr#"hello"#0 or more*All ASCIIN/A

* The number of #s on each side of the same literal must be equivalent

ASCII escapes

Name
\x417-bit character code (exactly 2 digits, up to 0x7F)
\nNewline
\rCarriage return
\tTab
\\Backslash
\0Null

Byte escapes

Name
\x7F8-bit character code (exactly 2 digits)
\nNewline
\rCarriage return
\tTab
\\Backslash
\0Null

Unicode escapes

Name
\u{7FFF}24-bit Unicode character code (up to 6 digits)

Quote escapes

Name
\'Single quote
\"Double quote

Numbers

Number literals*ExampleExponentiationSuffixes
Decimal integer98_222N/AInteger suffixes
Hex integer0xffN/AInteger suffixes
Octal integer0o77N/AInteger suffixes
Binary integer0b1111_0000N/AInteger suffixes
Floating-point123.0E+77OptionalFloating-point suffixes

* All number literals allow _ as a visual separator: 1_234.0E+18f64

Suffixes

A suffix is a sequence of characters following the primary part of a literal (without intervening whitespace), of the same form as a non-raw identifier or keyword.

Any kind of literal (string, integer, etc) with any suffix is valid as a token, and can be passed to a macro without producing an error. The macro itself will decide how to interpret such a token and whether to produce an error or not.


#![allow(unused)]
fn main() {
macro_rules! blackhole { ($tt:tt) => () }

blackhole!("string"suffix); // OK
}

However, suffixes on literal tokens parsed as Rust code are restricted. Any suffixes are rejected on non-numeric literal tokens, and numeric literal tokens are accepted only with suffixes from the list below.

IntegerFloating-point
u8, i8, u16, i16, u32, i32, u64, i64, u128, i128, usize, isizef32, f64

Character and string literals

Character literals

Lexer
CHAR_LITERAL :
   ' ( ~[' \ \n \r \t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE ) '

QUOTE_ESCAPE :
   \' | \"

ASCII_ESCAPE :
      \x OCT_DIGIT HEX_DIGIT
   | \n | \r | \t | \\ | \0

UNICODE_ESCAPE :
   \u{ ( HEX_DIGIT _* )1..6 }

A character literal is a single Unicode character enclosed within two U+0027 (single-quote) characters, with the exception of U+0027 itself, which must be escaped by a preceding U+005C character (\).

String literals

Lexer
STRING_LITERAL :
   " (
      ~[" \ IsolatedCR]
      | QUOTE_ESCAPE
      | ASCII_ESCAPE
      | UNICODE_ESCAPE
      | STRING_CONTINUE
   )* "

STRING_CONTINUE :
   \ followed by \n

A string literal is a sequence of any Unicode characters enclosed within two U+0022 (double-quote) characters, with the exception of U+0022 itself, which must be escaped by a preceding U+005C character (\).

Line-breaks are allowed in string literals. A line-break is either a newline (U+000A) or a pair of carriage return and newline (U+000D, U+000A). Both byte sequences are normally translated to U+000A, but as a special exception, when an unescaped U+005C character (\) occurs immediately before the line-break, then the U+005C character, the line-break, and all whitespace at the beginning of the next line are ignored. Thus a and b are equal:


#![allow(unused)]
fn main() {
let a = "foobar";
let b = "foo\
         bar";

assert_eq!(a,b);
}

Character escapes

Some additional escapes are available in either character or non-raw string literals. An escape starts with a U+005C (\) and continues with one of the following forms:

  • A 7-bit code point escape starts with U+0078 (x) and is followed by exactly two hex digits with value up to 0x7F. It denotes the ASCII character with value equal to the provided hex value. Higher values are not permitted because it is ambiguous whether they mean Unicode code points or byte values.
  • A 24-bit code point escape starts with U+0075 (u) and is followed by up to six hex digits surrounded by braces U+007B ({) and U+007D (}). It denotes the Unicode code point equal to the provided hex value.
  • A whitespace escape is one of the characters U+006E (n), U+0072 (r), or U+0074 (t), denoting the Unicode values U+000A (LF), U+000D (CR) or U+0009 (HT) respectively.
  • The null escape is the character U+0030 (0) and denotes the Unicode value U+0000 (NUL).
  • The backslash escape is the character U+005C (\) which must be escaped in order to denote itself.

Raw string literals

Lexer
RAW_STRING_LITERAL :
   r RAW_STRING_CONTENT

RAW_STRING_CONTENT :
      " ( ~ IsolatedCR )* (non-greedy) "
   | # RAW_STRING_CONTENT #

Raw string literals do not process any escapes. They start with the character U+0072 (r), followed by zero or more of the character U+0023 (#) and a U+0022 (double-quote) character. The raw string body can contain any sequence of Unicode characters and is terminated only by another U+0022 (double-quote) character, followed by the same number of U+0023 (#) characters that preceded the opening U+0022 (double-quote) character.

All Unicode characters contained in the raw string body represent themselves, the characters U+0022 (double-quote) (except when followed by at least as many U+0023 (#) characters as were used to start the raw string literal) or U+005C (\) do not have any special meaning.

Examples for string literals:


#![allow(unused)]
fn main() {
"foo"; r"foo";                     // foo
"\"foo\""; r#""foo""#;             // "foo"

"foo #\"# bar";
r##"foo #"# bar"##;                // foo #"# bar

"\x52"; "R"; r"R";                 // R
"\\x52"; r"\x52";                  // \x52
}

Byte and byte string literals

Byte literals

Lexer
BYTE_LITERAL :
   b' ( ASCII_FOR_CHAR | BYTE_ESCAPE ) '

ASCII_FOR_CHAR :
   any ASCII (i.e. 0x00 to 0x7F), except ', \, \n, \r or \t

BYTE_ESCAPE :
      \x HEX_DIGIT HEX_DIGIT
   | \n | \r | \t | \\ | \0 | \' | \"

A byte literal is a single ASCII character (in the U+0000 to U+007F range) or a single escape preceded by the characters U+0062 (b) and U+0027 (single-quote), and followed by the character U+0027. If the character U+0027 is present within the literal, it must be escaped by a preceding U+005C (\) character. It is equivalent to a u8 unsigned 8-bit integer number literal.

Byte string literals

Lexer
BYTE_STRING_LITERAL :
   b" ( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )* "

ASCII_FOR_STRING :
   any ASCII (i.e 0x00 to 0x7F), except ", \ and IsolatedCR

A non-raw byte string literal is a sequence of ASCII characters and escapes, preceded by the characters U+0062 (b) and U+0022 (double-quote), and followed by the character U+0022. If the character U+0022 is present within the literal, it must be escaped by a preceding U+005C (\) character. Alternatively, a byte string literal can be a raw byte string literal, defined below. The type of a byte string literal of length n is &'static [u8; n].

Some additional escapes are available in either byte or non-raw byte string literals. An escape starts with a U+005C (\) and continues with one of the following forms:

  • A byte escape escape starts with U+0078 (x) and is followed by exactly two hex digits. It denotes the byte equal to the provided hex value.
  • A whitespace escape is one of the characters U+006E (n), U+0072 (r), or U+0074 (t), denoting the bytes values 0x0A (ASCII LF), 0x0D (ASCII CR) or 0x09 (ASCII HT) respectively.
  • The null escape is the character U+0030 (0) and denotes the byte value 0x00 (ASCII NUL).
  • The backslash escape is the character U+005C (\) which must be escaped in order to denote its ASCII encoding 0x5C.

Raw byte string literals

Lexer
RAW_BYTE_STRING_LITERAL :
   br RAW_BYTE_STRING_CONTENT

RAW_BYTE_STRING_CONTENT :
      " ASCII* (non-greedy) "
   | # RAW_BYTE_STRING_CONTENT #

ASCII :
   any ASCII (i.e. 0x00 to 0x7F)

Raw byte string literals do not process any escapes. They start with the character U+0062 (b), followed by U+0072 (r), followed by zero or more of the character U+0023 (#), and a U+0022 (double-quote) character. The raw string body can contain any sequence of ASCII characters and is terminated only by another U+0022 (double-quote) character, followed by the same number of U+0023 (#) characters that preceded the opening U+0022 (double-quote) character. A raw byte string literal can not contain any non-ASCII byte.

All characters contained in the raw string body represent their ASCII encoding, the characters U+0022 (double-quote) (except when followed by at least as many U+0023 (#) characters as were used to start the raw string literal) or U+005C (\) do not have any special meaning.

Examples for byte string literals:


#![allow(unused)]
fn main() {
b"foo"; br"foo";                     // foo
b"\"foo\""; br#""foo""#;             // "foo"

b"foo #\"# bar";
br##"foo #"# bar"##;                 // foo #"# bar

b"\x52"; b"R"; br"R";                // R
b"\\x52"; br"\x52";                  // \x52
}

Number literals

A number literal is either an integer literal or a floating-point literal. The grammar for recognizing the two kinds of literals is mixed.

Integer literals

Lexer
INTEGER_LITERAL :
   ( DEC_LITERAL | BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) INTEGER_SUFFIX?

DEC_LITERAL :
   DEC_DIGIT (DEC_DIGIT|_)*

BIN_LITERAL :
   0b (BIN_DIGIT|_)* BIN_DIGIT (BIN_DIGIT|_)*

OCT_LITERAL :
   0o (OCT_DIGIT|_)* OCT_DIGIT (OCT_DIGIT|_)*

HEX_LITERAL :
   0x (HEX_DIGIT|_)* HEX_DIGIT (HEX_DIGIT|_)*

BIN_DIGIT : [0-1]

OCT_DIGIT : [0-7]

DEC_DIGIT : [0-9]

HEX_DIGIT : [0-9 a-f A-F]

INTEGER_SUFFIX :
      u8 | u16 | u32 | u64 | u128 | usize
   | i8 | i16 | i32 | i64 | i128 | isize

An integer literal has one of four forms:

  • A decimal literal starts with a decimal digit and continues with any mixture of decimal digits and underscores.
  • A hex literal starts with the character sequence U+0030 U+0078 (0x) and continues as any mixture (with at least one digit) of hex digits and underscores.
  • An octal literal starts with the character sequence U+0030 U+006F (0o) and continues as any mixture (with at least one digit) of octal digits and underscores.
  • A binary literal starts with the character sequence U+0030 U+0062 (0b) and continues as any mixture (with at least one digit) of binary digits and underscores.

Like any literal, an integer literal may be followed (immediately, without any spaces) by an integer suffix, which must be the name of one of the primitive integer types: u8, i8, u16, i16, u32, i32, u64, i64, u128, i128, usize, or isize. See literal expressions for the effect of these suffixes.

Examples of integer literals of various forms:


#![allow(unused)]
fn main() {
#![allow(overflowing_literals)]
123;
123i32;
123u32;
123_u32;

0xff;
0xff_u8;
0x01_f32; // integer 7986, not floating-point 1.0
0x01_e3;  // integer 483, not floating-point 1000.0

0o70;
0o70_i16;

0b1111_1111_1001_0000;
0b1111_1111_1001_0000i64;
0b________1;

0usize;

// These are too big for their type, but are still valid tokens

128_i8;
256_u8;

}

Note that -1i8, for example, is analyzed as two tokens: - followed by 1i8.

Examples of invalid integer literals:


#![allow(unused)]
fn main() {
// uses numbers of the wrong base

0b0102;
0o0581;

// bin, hex, and octal literals must have at least one digit

0b_;
0b____;
}

Tuple index

Lexer
TUPLE_INDEX:
   INTEGER_LITERAL

A tuple index is used to refer to the fields of tuples, tuple structs, and tuple variants.

Tuple indices are compared with the literal token directly. Tuple indices start with 0 and each successive index increments the value by 1 as a decimal value. Thus, only decimal values will match, and the value must not have any extra 0 prefix characters.


#![allow(unused)]
fn main() {
let example = ("dog", "cat", "horse");
let dog = example.0;
let cat = example.1;
// The following examples are invalid.
let cat = example.01;  // ERROR no field named `01`
let horse = example.0b10;  // ERROR no field named `0b10`
}

Note: The tuple index may include an INTEGER_SUFFIX, but this is not intended to be valid, and may be removed in a future version. See https://github.com/rust-lang/rust/issues/60210 for more information.

Floating-point literals

Lexer
FLOAT_LITERAL :
      DEC_LITERAL . (not immediately followed by ., _ or an XID_Start character)
   | DEC_LITERAL FLOAT_EXPONENT
   | DEC_LITERAL . DEC_LITERAL FLOAT_EXPONENT?
   | DEC_LITERAL (. DEC_LITERAL)? FLOAT_EXPONENT? FLOAT_SUFFIX

FLOAT_EXPONENT :
   (e|E) (+|-)? (DEC_DIGIT|_)* DEC_DIGIT (DEC_DIGIT|_)*

FLOAT_SUFFIX :
   f32 | f64

A floating-point literal has one of three forms:

  • A decimal literal followed by a period character U+002E (.). This is optionally followed by another decimal literal, with an optional exponent.
  • A single decimal literal followed by an exponent.
  • A single decimal literal (in which case a suffix is required).

Like integer literals, a floating-point literal may be followed by a suffix, so long as the pre-suffix part does not end with U+002E (.). There are two valid floating-point suffixes: f32 and f64 (the names of the 32-bit and 64-bit primitive floating-point types). See literal expressions for the effect of these suffixes.

Examples of floating-point literals of various forms:


#![allow(unused)]
fn main() {
123.0f64;
0.1f64;
0.1f32;
12E+99_f64;
5f32;
let x: f64 = 2.;
}

This last example is different because it is not possible to use the suffix syntax with a floating point literal ending in a period. 2.f64 would attempt to call a method named f64 on 2.

Note that -1.0, for example, is analyzed as two tokens: - followed by 1.0.

Number pseudoliterals

Lexer
NUMBER_PSEUDOLITERAL :
      DEC_LITERAL ( . DEC_LITERAL )? FLOAT_EXPONENT
         ( NUMBER_PSEUDOLITERAL_SUFFIX | INTEGER_SUFFIX )
   | DEC_LITERAL . DEC_LITERAL
         ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | INTEGER SUFFIX )
   | DEC_LITERAL NUMBER_PSEUDOLITERAL_SUFFIX_NO_E
   | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL )
         ( NUMBER_PSEUDOLITERAL_SUFFIX_NO_E | FLOAT_SUFFIX )

NUMBER_PSEUDOLITERAL_SUFFIX :
   IDENTIFIER_OR_KEYWORD not matching INTEGER_SUFFIX or FLOAT_SUFFIX

NUMBER_PSEUDOLITERAL_SUFFIX_NO_E :
   NUMBER_PSEUDOLITERAL_SUFFIX not beginning with e or E

Tokenization of numeric literals allows arbitrary suffixes as described in the grammar above. These values generate valid tokens, but are not valid literal expressions, so are usually an error except as macro arguments.

Examples of such tokens:


#![allow(unused)]
fn main() {
0invalidSuffix;
123AFB43;
0b010a;
0xAB_CD_EF_GH;
2.0f80;
2e5f80;
2e5e6;
2.0e5e6;
1.3e10u64;
0b1111_f32;
}

Reserved forms similar to number literals

Lexer
RESERVED_NUMBER :
      BIN_LITERAL [2-9​]
   | OCT_LITERAL [8-9​]
   | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) .
         (not immediately followed by ., _ or an XID_Start character)
   | ( BIN_LITERAL | OCT_LITERAL ) e
   | 0b _* end of input or not BIN_DIGIT
   | 0o _* end of input or not OCT_DIGIT
   | 0x _* end of input or not HEX_DIGIT
   | DEC_LITERAL ( . DEC_LITERAL)? (e|E) (+|-)? end of input or not DEC_DIGIT

The following lexical forms similar to number literals are reserved forms. Due to the possible ambiguity these raise, they are rejected by the tokenizer instead of being interpreted as separate tokens.

  • An unsuffixed binary or octal literal followed, without intervening whitespace, by a decimal digit out of the range for its radix.

  • An unsuffixed binary, octal, or hexadecimal literal followed, without intervening whitespace, by a period character (with the same restrictions on what follows the period as for floating-point literals).

  • An unsuffixed binary or octal literal followed, without intervening whitespace, by the character e.

  • Input which begins with one of the radix prefixes but is not a valid binary, octal, or hexadecimal literal (because it contains no digits).

  • Input which has the form of a floating-point literal with no digits in the exponent.

Examples of reserved forms:


#![allow(unused)]
fn main() {
0b0102;  // this is not `0b010` followed by `2`
0o1279;  // this is not `0o127` followed by `9`
0x80.0;  // this is not `0x80` followed by `.` and `0`
0b101e;  // this is not a pseudoliteral, or `0b101` followed by `e`
0b;      // this is not a pseudoliteral, or `0` followed by  `b`
0b_;     // this is not a pseudoliteral, or `0` followed by  `b_`
2e;      // this is not a pseudoliteral, or `2` followed by `e`
2.0e;    // this is not a pseudoliteral, or `2.0` followed by `e`
2em;     // this is not a pseudoliteral, or `2` followed by `em`
2.0em;   // this is not a pseudoliteral, or `2.0` followed by `em`
}

Boolean literals

Lexer
BOOLEAN_LITERAL :
      true
   | false

The two values of the boolean type are written true and false.

Lifetimes and loop labels

Lexer
LIFETIME_TOKEN :
      ' IDENTIFIER_OR_KEYWORD
   | '_

LIFETIME_OR_LABEL :
      ' NON_KEYWORD_IDENTIFIER

Lifetime parameters and loop labels use LIFETIME_OR_LABEL tokens. Any LIFETIME_TOKEN will be accepted by the lexer, and for example, can be used in macros.

Punctuation

Punctuation symbol tokens are listed here for completeness. Their individual usages and meanings are defined in the linked pages.

SymbolNameUsage
+PlusAddition, Trait Bounds, Macro Kleene Matcher
-MinusSubtraction, Negation
*StarMultiplication, Dereference, Raw Pointers, Macro Kleene Matcher, Use wildcards
/SlashDivision
%PercentRemainder
^CaretBitwise and Logical XOR
!NotBitwise and Logical NOT, Macro Calls, Inner Attributes, Never Type, Negative impls
&AndBitwise and Logical AND, Borrow, References, Reference patterns
|OrBitwise and Logical OR, Closures, Patterns in match, if let, and while let
&&AndAndLazy AND, Borrow, References, Reference patterns
||OrOrLazy OR, Closures
<<ShlShift Left, Nested Generics
>>ShrShift Right, Nested Generics
+=PlusEqAddition assignment
-=MinusEqSubtraction assignment
*=StarEqMultiplication assignment
/=SlashEqDivision assignment
%=PercentEqRemainder assignment
^=CaretEqBitwise XOR assignment
&=AndEqBitwise And assignment
|=OrEqBitwise Or assignment
<<=ShlEqShift Left assignment
>>=ShrEqShift Right assignment, Nested Generics
=EqAssignment, Attributes, Various type definitions
==EqEqEqual
!=NeNot Equal
>GtGreater than, Generics, Paths
<LtLess than, Generics, Paths
>=GeGreater than or equal to, Generics
<=LeLess than or equal to
@AtSubpattern binding
_UnderscoreWildcard patterns, Inferred types, Unnamed items in constants, extern crates, use declarations, and destructuring assignment
.DotField access, Tuple index
..DotDotRange, Struct expressions, Patterns, Range Patterns
...DotDotDotVariadic functions, Range patterns
..=DotDotEqInclusive Range, Range patterns
,CommaVarious separators
;SemiTerminator for various items and statements, Array types
:ColonVarious separators
::PathSepPath separator
->RArrowFunction return type, Closure return type, Function pointer type
=>FatArrowMatch arms, Macros
#PoundAttributes
$DollarMacros
?QuestionQuestion mark operator, Questionably sized, Macro Kleene Matcher
~TildeThe tilde operator has been unused since before Rust 1.0, but its token may still be used

Delimiters

Bracket punctuation is used in various parts of the grammar. An open bracket must always be paired with a close bracket. Brackets and the tokens within them are referred to as "token trees" in macros. The three types of brackets are:

BracketType
{ }Curly braces
[ ]Square brackets
( )Parentheses

Reserved prefixes

Lexer 2021+
RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD Except b or r or br | _ ) "
RESERVED_TOKEN_SINGLE_QUOTE : ( IDENTIFIER_OR_KEYWORD Except b | _ ) '
RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD Except r or br | _ ) #

Some lexical forms known as reserved prefixes are reserved for future use.

Source input which would otherwise be lexically interpreted as a non-raw identifier (or a keyword or _) which is immediately followed by a #, ', or " character (without intervening whitespace) is identified as a reserved prefix.

Note that raw identifiers, raw string literals, and raw byte string literals may contain a # character but are not interpreted as containing a reserved prefix.

Similarly the r, b, and br prefixes used in raw string literals, byte literals, byte string literals, and raw byte string literals are not interpreted as reserved prefixes.

Edition Differences: Starting with the 2021 edition, reserved prefixes are reported as an error by the lexer (in particular, they cannot be passed to macros).

Before the 2021 edition, a reserved prefixes are accepted by the lexer and interpreted as multiple tokens (for example, one token for the identifier or keyword, followed by a # token).

Examples accepted in all editions:


#![allow(unused)]
fn main() {
macro_rules! lexes {($($_:tt)*) => {}}
lexes!{a #foo}
lexes!{continue 'foo}
lexes!{match "..." {}}
lexes!{r#let#foo}         // three tokens: r#let # foo
}

Examples accepted before the 2021 edition but rejected later:


#![allow(unused)]
fn main() {
macro_rules! lexes {($($_:tt)*) => {}}
lexes!{a#foo}
lexes!{continue'foo}
lexes!{match"..." {}}
}