The backslash character has many uses. First, if it is followed by a
      character that is not a number or a letter, it takes away any special
      meaning that a character can have. This use of backslash as an escape
      character applies both inside and outside character classes.
    For example, if you want to match a * character, you write \* in the
      pattern. This escaping action applies if the following character would
      otherwise be interpreted as a metacharacter, so it is always safe to
      precede a non-alphanumeric with backslash to specify that it stands for
      itself. In particular, if you want to match a backslash, write \\.
    In unicode mode, only ASCII numbers and letters have any special
      meaning after a backslash. All other characters (in particular, those
      whose code points are > 127) are treated as literals.
    If a pattern is compiled with option extended, whitespace in the
      pattern (other than in a character class) and characters between a #
      outside a character class and the next newline are ignored. An escaping
      backslash can be used to include a whitespace or # character as part of
      the pattern.
    To remove the special meaning from a sequence of characters, put them
      between \Q and \E. This is different from Perl in that $ and @ are
      handled as literals in \Q...\E sequences in PCRE, while $ and @ cause
      variable interpolation in Perl. Notice the following examples:
Pattern            PCRE matches   Perl matches
\Qabc$xyz\E        abc$xyz        abc followed by the contents of $xyz
\Qabc\$xyz\E       abc\$xyz       abc\$xyz
\Qabc\E\$\Qxyz\E   abc$xyz        abc$xyz
 
    The \Q...\E sequence is recognized both inside and outside character
      classes. An isolated \E that is not preceded by \Q is ignored. If \Q is
      not followed by \E later in the pattern, the literal interpretation
      continues to the end of the pattern (that is, \E is assumed at the end).
      If the isolated \Q is inside a character class, this causes an error, as
      the character class is not terminated.
    Non-Printing Characters
    
    A second use of backslash provides a way of encoding non-printing
      characters in patterns in a visible manner. There is no restriction on the
      appearance of non-printing characters, apart from the binary zero that
      terminates a pattern. When a pattern is prepared by text editing, it is
      often easier to use one of the following escape sequences than the binary
      character it represents:
    
      - \a
 
- Alarm, that is, the BEL character (hex 07)
 
      - \cx
 
- "Control-x", where x is any ASCII character
 
      - \e
 
- Escape (hex 1B)
 
      - \f
 
- Form feed (hex 0C)
 
      - \n
 
- Line feed (hex 0A)
 
      - \r
 
- Carriage return (hex 0D)
 
      - \t
 
- Tab (hex 09)
 
      - \0dd
 
- Character with octal code 0dd
 
      - \ddd
 
- Character with octal code ddd, or back reference
        
 
      - \o{ddd..}
 
- character with octal code ddd..
 
      - \xhh
 
- Character with hex code hh
 
      - \x{hhh..}
 
- Character with hex code hhh..
 
    
    
Note
Note that \0dd is always an octal code, and that \8 and \9 are
    the literal characters "8" and "9".
 
 
    The precise effect of \cx on ASCII characters is as follows: if x is a
      lowercase letter, it is converted to upper case. Then bit 6 of the
      character (hex 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A
      (A is 41, Z is 5A), but \c{ becomes hex 3B ({ is 7B), and \c; becomes
      hex 7B (; is 3B). If the data item (byte or 16-bit value) following \c
      has a value > 127, a compile-time error occurs. This locks out
      non-ASCII characters in all modes.
    The \c facility was designed for use with ASCII characters, but with the
      extension to Unicode it is even less useful than it once was.
    After \0 up to two further octal digits are read. If there are fewer than
    two digits, just those that are present are used. Thus the sequence
    \0\x\015 specifies two binary zeros followed by a CR character (code value
    13). Make sure you supply two digits after the initial zero if the pattern
    character that follows is itself an octal digit.
    The escape \o must be followed by a sequence of octal digits, enclosed
    in braces. An error occurs if this is not the case. This escape is a recent
    addition to Perl; it provides way of specifying character code points as
    octal numbers greater than 0777, and it also allows octal numbers and back
    references to be unambiguously specified.
    For greater clarity and unambiguity, it is best to avoid following \ by
    a digit greater than zero. Instead, use \o{} or \x{} to specify character
    numbers, and \g{} to specify back references. The following paragraphs
    describe the old, ambiguous syntax.
    The handling of a backslash followed by a digit other than 0 is
      complicated, and Perl has changed in recent releases, causing PCRE also
      to change. Outside a character class, PCRE reads the digit and any following
      digits as a decimal number. If the number is < 8, or if there have
      been at least that many previous capturing left parentheses in the
      expression, the entire sequence is taken as a back reference. A
      description of how this works is provided later, following the discussion
      of parenthesized subpatterns.
    Inside a character class, or if the decimal number following \ is >
      7 and there have not been that many capturing subpatterns, PCRE handles
      \8 and \9 as the literal characters "8" and "9", and otherwise re-reads
      up to three octal digits following the backslash, and using them to
      generate a data character. Any subsequent digits stand for themselves.
      For example:
    
      - \040
 
        - Another way of writing an ASCII space
 
      - \40
 
        - The same, provided there are < 40 previous capturing
          subpatterns
 
      - \7
 
        - Always a back reference
 
      - \11
 
        - Can be a back reference, or another way of writing a tab
 
      - \011
 
        - Always a tab
 
      - \0113
 
        - A tab followed by character "3"
 
      - \113
 
        - Can be a back reference, otherwise the character with octal code
          113 
 
      - \377
 
        - Can be a back reference, otherwise value 255 (decimal)
 
      - \81
 
        - Either a back reference, or the two characters "8" and "1"
 
    
    Notice that octal values >= 100 that are specified using this syntax
    must not be introduced by a leading zero, as no more than three octal digits
    are ever read.
    By default, after \x that is not followed by {, from zero to two
    hexadecimal digits are read (letters can be in upper or lower case). Any
    number of hexadecimal digits may appear between \x{ and }. If a character
    other than a hexadecimal digit appears between \x{ and }, or if there is no
    terminating }, an error occurs.
    
    Characters whose value is less than 256 can be defined by either of the
    two syntaxes for \x. There is no difference in the way they are handled. For
    example, \xdc is exactly the same as \x{dc}.
    Constraints on character values
    Characters that are specified using octal or hexadecimal numbers are
    limited to certain values, as follows:
    
      - 8-bit non-UTF mode
 
      < 0x100
 
      - 8-bit UTF-8 mode
 
      < 0x10ffff and a valid codepoint
 
    
    Invalid Unicode codepoints are the range 0xd800 to 0xdfff (the
    so-called "surrogate" codepoints), and 0xffef.
    Escape sequences in character classes
    All the sequences that define a single character value can be used both
      inside and outside character classes. Also, inside a character class, \b
      is interpreted as the backspace character (hex 08).
    \N is not allowed in a character class. \B, \R, and \X are not special
      inside a character class. Like other unrecognized escape sequences, they
      are treated as the literal characters "B", "R", and "X". Outside a
      character class, these sequences have different meanings.
    Unsupported Escape Sequences
    In Perl, the sequences \l, \L, \u, and \U are recognized by its string
      handler and used to modify the case of following characters. PCRE does not
      support these escape sequences.
    Absolute and Relative Back References
    The sequence \g followed by an unsigned or a negative number, optionally
      enclosed in braces, is an absolute or relative back reference. A named
      back reference can be coded as \g{name}. Back references are discussed
      later, following the discussion of parenthesized subpatterns.
    Absolute and Relative Subroutine Calls
    For compatibility with Oniguruma, the non-Perl syntax \g followed by a
      name or a number enclosed either in angle brackets or single quotes, is
      alternative syntax for referencing a subpattern as a "subroutine".
      Details are discussed later. Notice that \g{...} (Perl syntax) and
      \g<...> (Oniguruma syntax) are not synonymous. The former
      is a back reference and the latter is a subroutine call.
    Generic Character Types
    
    Another use of backslash is for specifying generic character types:
      
      - \d
 
- Any decimal digit
 
      - \D
 
- Any character that is not a decimal digit
 
      - \h
 
- Any horizontal whitespace character
 
      - \H
 
- Any character that is not a horizontal whitespace
        character
 
      - \s
 
- Any whitespace character
 
      - \S
 
- Any character that is not a whitespace character
        
 
      - \v
 
- Any vertical whitespace character
 
      - \V
 
- Any character that is not a vertical whitespace
        character
 
      - \w
 
- Any "word" character
 
      - \W
 
- Any "non-word" character
 
    
    There is also the single sequence \N, which matches a non-newline
      character. This is the same as the "." metacharacter when dotall
      is not set. Perl also uses \N to match characters by name, but PCRE does
      not support this.
    Each pair of lowercase and uppercase escape sequences partitions the
      complete set of characters into two disjoint sets. Any given character
      matches one, and only one, of each pair. The sequences can appear both
      inside and outside character classes. They each match one character of the
      appropriate type. If the current matching point is at the end of the
      subject string, all fail, as there is no character to match.
    For compatibility with Perl, \s did not used to match the VT character (code
      11), which made it different from the the POSIX "space" class. However, Perl
      added VT at release 5.18, and PCRE followed suit at release 8.34. The default
      \s characters are now HT (9), LF (10), VT (11), FF (12), CR (13), and space
      (32), which are defined as white space in the "C" locale. This list may vary if
      locale-specific matching is taking place. For example, in some locales the
      "non-breaking space" character (\xA0) is recognized as white space, and in
      others the VT character is not.
    A "word" character is an underscore or any character that is a letter or
      a digit. By default, the definition of letters and digits is controlled by
      the PCRE low-valued character tables, in Erlang's case (and without option
      unicode), the ISO Latin-1 character set.
    By default, in unicode mode, characters with values > 255, that
      is, all characters outside the ISO Latin-1 character set, never match \d,
      \s, or \w, and always match \D, \S, and \W. These sequences retain their
      original meanings from before UTF support was available, mainly for
      efficiency reasons. However, if option ucp is set, the behavior is
      changed so that Unicode properties are used to determine character types,
      as follows:
    
      - \d
 
- Any character that \p{Nd} matches (decimal digit)
        
 
      - \s
 
- Any character that \p{Z} or \h or \v
        
 
      - \w
 
- Any character that matches \p{L} or \p{N} matches, plus
        underscore
 
    
    The uppercase escapes match the inverse sets of characters. Notice that
      \d matches only decimal digits, while \w matches any Unicode digit, any
      Unicode letter, and underscore. Notice also that ucp affects \b and
      \B, as they are defined in terms of \w and \W. Matching these sequences is
      noticeably slower when ucp is set.
    The sequences \h, \H, \v, and \V are features that were added to Perl in
      release 5.10. In contrast to the other sequences, which match only ASCII
      characters by default, these always match certain high-valued code points,
      regardless if ucp is set.
    The following are the horizontal space characters:
    
      - U+0009
 
- Horizontal tab (HT)
 
      - U+0020
 
- Space
 
      - U+00A0
 
- Non-break space
 
      - U+1680
 
- Ogham space mark
 
      - U+180E
 
- Mongolian vowel separator
 
      - U+2000
 
- En quad
 
      - U+2001
 
- Em quad
 
      - U+2002
 
- En space
 
      - U+2003
 
- Em space
 
      - U+2004
 
- Three-per-em space
 
      - U+2005
 
- Four-per-em space
 
      - U+2006
 
- Six-per-em space
 
      - U+2007
 
- Figure space
 
      - U+2008
 
- Punctuation space
 
      - U+2009
 
- Thin space
 
      - U+200A
 
- Hair space
 
      - U+202F
 
- Narrow no-break space
 
      - U+205F
 
- Medium mathematical space
 
      - U+3000
 
- Ideographic space
 
    
    The following are the vertical space characters:
    
      - U+000A
 
- Line feed (LF)
 
      - U+000B
 
- Vertical tab (VT)
 
      - U+000C
 
- Form feed (FF)
 
      - U+000D
 
- Carriage return (CR)
 
      - U+0085
 
- Next line (NEL)
 
      - U+2028
 
- Line separator
 
      - U+2029
 
- Paragraph separator
 
    
    In 8-bit, non-UTF-8 mode, only the characters with code points < 256
      are relevant.
    Newline Sequences
    
    Outside a character class, by default, the escape sequence \R matches any
      Unicode newline sequence. In non-UTF-8 mode, \R is equivalent to the
      following:
    (?>\r\n|\n|\x0b|\f|\r|\x85)
 
    This is an example of an "atomic group", details are provided below.
    This particular group matches either the two-character sequence CR
      followed by LF, or one of the single characters LF (line feed, U+000A),
      VT (vertical tab, U+000B), FF (form feed, U+000C), CR (carriage return,
      U+000D), or NEL (next line, U+0085). The two-character sequence is
      treated as a single unit that cannot be split.
    In Unicode mode, two more characters whose code points are > 255 are
      added: LS (line separator, U+2028) and PS (paragraph separator, U+2029).
      Unicode character property support is not needed for these characters to
      be recognized.
    \R can be restricted to match only CR, LF, or CRLF (instead of the
      complete set of Unicode line endings) by setting option bsr_anycrlf
      either at compile time or when the pattern is matched. (BSR is an acronym
      for "backslash R".) This can be made the default when PCRE is built; if
      so, the other behavior can be requested through option
      bsr_unicode. These settings can also be specified by starting a
      pattern string with one of the following sequences:
    
      - (*BSR_ANYCRLF)
 
      - CR, LF, or CRLF only
 
      - (*BSR_UNICODE)
 
      - Any Unicode newline sequence
 
    
    These override the default and the options specified to the compiling
      function, but they can themselves be overridden by options specified to a
      matching function. Notice that these special settings, which are not
      Perl-compatible, are recognized only at the very start of a pattern, and
      that they must be in upper case. If more than one of them is present, the
      last one is used. They can be combined with a change of newline
      convention; for example, a pattern can start with:
    
    They can also be combined with the (*UTF8), (*UTF), or (*UCP) special
      sequences. Inside a character class, \R is treated as an unrecognized
      escape sequence, and so matches the letter "R" by default.
    Unicode Character Properties
    Three more escape sequences that match characters with specific
      properties are available. When in 8-bit non-UTF-8 mode, these sequences
      are limited to testing characters whose code points are <
      256, but they do work in this mode. The following are the extra escape
      sequences:
    
      - \p{xx}
 
      - A character with property xx
 
      - \P{xx}
 
      - A character without property xx
 
      - \X
 
      - A Unicode extended grapheme cluster
 
    
    The property names represented by xx above are limited to the
      Unicode script names, the general category properties, "Any", which
      matches any character (including newline), and some special PCRE
      properties (described in the next section). Other Perl properties, such as
      "InMusicalSymbols", are currently not supported by PCRE. Notice that
      \P{Any} does not match any characters and always causes a match
      failure.
    Sets of Unicode characters are defined as belonging to certain scripts.
      A character from one of these sets can be matched using a script name, for
      example:
    
    Those that are not part of an identified script are lumped together as
      "Common". The following is the current list of scripts:
    
      - Arabic
 
      - Armenian
 
      - Avestan
 
      - Balinese
 
      - Bamum
 
      - Bassa_Vah
 
      - Batak
 
      - Bengali
 
      - Bopomofo
 
      - Braille
 
      - Buginese
 
      - Buhid
 
      - Canadian_Aboriginal
 
      - Carian
 
      - Caucasian_Albanian
 
      - Chakma
 
      - Cham
 
      - Cherokee
 
      - Common
 
      - Coptic
 
      - Cuneiform
 
      - Cypriot
 
      - Cyrillic
 
      - Deseret
 
      - Devanagari
 
      - Duployan
 
      - Egyptian_Hieroglyphs
 
      - Elbasan
 
      - Ethiopic
 
      - Georgian
 
      - Glagolitic
 
      - Gothic
 
      - Grantha
 
      - Greek
 
      - Gujarati
 
      - Gurmukhi
 
      - Han
 
      - Hangul
 
      - Hanunoo
 
      - Hebrew
 
      - Hiragana
 
      - Imperial_Aramaic
 
      - Inherited
 
      - Inscriptional_Pahlavi
 
      - Inscriptional_Parthian
 
      - Javanese
 
      - Kaithi
 
      - Kannada
 
      - Katakana
 
      - Kayah_Li
 
      - Kharoshthi
 
      - Khmer
 
      - Khojki
 
      - Khudawadi
 
      - Lao
 
      - Latin
 
      - Lepcha
 
      - Limbu
 
      - Linear_A
 
      - Linear_B
 
      - Lisu
 
      - Lycian
 
      - Lydian
 
      - Mahajani
 
      - Malayalam
 
      - Mandaic
 
      - Manichaean
 
      - Meetei_Mayek
 
      - Mende_Kikakui
 
      - Meroitic_Cursive
 
      - Meroitic_Hieroglyphs
 
      - Miao
 
      - Modi
 
      - Mongolian
 
      - Mro
 
      - Myanmar
 
      - Nabataean
 
      - New_Tai_Lue
 
      - Nko
 
      - Ogham
 
      - Ol_Chiki
 
      - Old_Italic
 
      - Old_North_Arabian
 
      - Old_Permic
 
      - Old_Persian
 
      - Oriya
 
      - Old_South_Arabian
 
      - Old_Turkic
 
      - Osmanya
 
      - Pahawh_Hmong
 
      - Palmyrene
 
      - Pau_Cin_Hau
 
      - Phags_Pa
 
      - Phoenician
 
      - Psalter_Pahlavi
 
      - Rejang
 
      - Runic
 
      - Samaritan
 
      - Saurashtra
 
      - Sharada
 
      - Shavian
 
      - Siddham
 
      - Sinhala
 
      - Sora_Sompeng
 
      - Sundanese
 
      - Syloti_Nagri
 
      - Syriac
 
      - Tagalog
 
      - Tagbanwa
 
      - Tai_Le
 
      - Tai_Tham
 
      - Tai_Viet
 
      - Takri
 
      - Tamil
 
      - Telugu
 
      - Thaana
 
      - Thai
 
      - Tibetan
 
      - Tifinagh
 
      - Tirhuta
 
      - Ugaritic
 
      - Vai
 
      - Warang_Citi
 
      - Yi
 
    
    Each character has exactly one Unicode general category property,
      specified by a two-letter acronym. For compatibility with Perl, negation
      can be specified by including a circumflex between the opening brace and
      the property name. For example, \p{^Lu} is the same as \P{Lu}.
    If only one letter is specified with \p or \P, it includes all the
      general category properties that start with that letter. In this case, in
      the absence of negation, the curly brackets in the escape sequence are
      optional. The following two examples have the same effect:
    
    The following general category property codes are supported:
      
      - C
 
- Other
 
      - Cc
 
- Control
 
      - Cf
 
- Format
 
      - Cn
 
- Unassigned
 
      - Co
 
- Private use
 
      - Cs
 
- Surrogate
  
      - L
 
- Letter
 
      - Ll
 
- Lowercase letter
 
      - Lm
 
- Modifier letter
 
      - Lo
 
- Other letter
 
      - Lt
 
- Title case letter
 
      - Lu
 
- Uppercase letter
   
      - M
 
- Mark
 
      - Mc
 
- Spacing mark
 
      - Me
 
- Enclosing mark
 
      - Mn
 
- Non-spacing mark
   
      - N
 
- Number
 
      - Nd
 
- Decimal number
 
      - Nl
 
- Letter number
 
      - No
 
- Other number
   
      - P
 
- Punctuation
 
      - Pc
 
- Connector punctuation
 
      - Pd
 
- Dash punctuation
 
      - Pe
 
- Close punctuation
 
      - Pf
 
- Final punctuation
 
      - Pi
 
- Initial punctuation
 
      - Po
 
- Other punctuation
 
      - Ps
 
- Open punctuation
 
      - S
 
- Symbol
 
      - Sc
 
- Currency symbol
 
      - Sk
 
- Modifier symbol
 
      - Sm
 
- Mathematical symbol
 
      - So
 
- Other symbol
  
      - Z
 
- Separator
 
      - Zl
 
- Line separator
 
      - Zp
 
- Paragraph separator
 
      - Zs
 
- Space separator
 
    
    The special property L& is also supported. It matches a character
      that has the Lu, Ll, or Lt property, that is, a letter that is not
      classified as a modifier or "other".
    The Cs (Surrogate) property applies only to characters in the range
      U+D800 to U+DFFF. Such characters are invalid in Unicode strings and so
      cannot be tested by PCRE. Perl does not support the Cs property.
    The long synonyms for property names supported by Perl (such as
      \p{Letter}) are not supported by PCRE. It is not permitted to prefix any
      of these properties with "Is".
    No character in the Unicode table has the Cn (unassigned) property.
      This property is instead assumed for any code point that is not in the
      Unicode table.
    Specifying caseless matching does not affect these escape sequences. For
      example, \p{Lu} always matches only uppercase letters. This is different
      from the behavior of current versions of Perl.
    Matching characters by Unicode property is not fast, as PCRE must do a
      multistage table lookup to find a character property. That is why the
      traditional escape sequences such as \d and \w do not use Unicode
      properties in PCRE by default. However, you can make them do so by setting
      option ucp or by starting the pattern with (*UCP).
    Extended Grapheme Clusters
    The \X escape matches any number of Unicode characters that form an
      "extended grapheme cluster", and treats the sequence as an atomic group
      (see below). Up to and including release 8.31, PCRE matched an earlier,
      simpler definition that was equivalent to (?>\PM\pM*). That is,
      it matched a character without the "mark" property, followed by zero or
      more characters with the "mark" property. Characters with the "mark"
      property are typically non-spacing accents that affect the preceding
      character.
    This simple definition was extended in Unicode to include more
      complicated kinds of composite character by giving each character a
      grapheme breaking property, and creating rules that use these properties
      to define the boundaries of extended grapheme clusters. In PCRE releases
      later than 8.31, \X matches one of these clusters.
    \X always matches at least one character. Then it decides whether to add
      more characters according to the following rules for ending a cluster:
    
      - 
        
End at the end of the subject string.
       
      - 
        
Do not end between CR and LF; otherwise end after any control
          character.
       
      - 
        
Do not break Hangul (a Korean script) syllable sequences. Hangul
          characters are of five types: L, V, T, LV, and LVT. An L character can
          be followed by an L, V, LV, or LVT character. An LV or V character can
          be followed by a V or T character. An LVT or T character can be
          followed only by a T character.
       
      - 
        
Do not end before extending characters or spacing marks. Characters
          with the "mark" property always have the "extend" grapheme breaking
          property.
       
      - 
        
Do not end after prepend characters.
       
      - 
        
Otherwise, end the cluster.
       
    
    PCRE Additional Properties
    In addition to the standard Unicode properties described earlier, PCRE
      supports four more that make it possible to convert traditional escape
      sequences, such as \w and \s to use Unicode
      properties. PCRE uses these non-standard, non-Perl properties internally
      when the ucp option is passed. However, they can also be used
      explicitly. The properties are as follows:
    
      - Xan
 
      - 
        
Any alphanumeric character. Matches characters that have either the
          L (letter) or the N (number) property.
       
      - Xps
 
      - 
        
Any Posix space character. Matches the characters tab, line feed,
          vertical tab, form feed, carriage return, and any other character
          that has the Z (separator) property.
       
      - Xsp
 
      - 
        
Any Perl space character. Matches the same as Xps, except that
          vertical tab is excluded.
       
      - Xwd
 
      - 
        
Any Perl "word" character. Matches the same characters as Xan, plus
          underscore.
       
    
    Perl and POSIX space are now the same. Perl added VT to its space
    character set at release 5.18 and PCRE changed at release 8.34.
    Xan matches characters that have either the L (letter) or the N (number)
    property. Xps matches the characters tab, linefeed, vertical tab, form feed,
    or carriage return, and any other character that has the Z (separator)
    property. Xsp is the same as Xps; it used to exclude vertical tab, for Perl
    compatibility, but Perl changed, and so PCRE followed at release 8.34. Xwd
    matches the same characters as Xan, plus underscore.
    
    There is another non-standard property, Xuc, which matches any character
      that can be represented by a Universal Character Name in C++ and other
      programming languages. These are the characters $, @, ` (grave accent),
      and all characters with Unicode code points >= U+00A0, except for the
      surrogates U+D800 to U+DFFF. Notice that most base (ASCII) characters are
      excluded. (Universal Character Names are of the form \uHHHH or \UHHHHHHHH,
      where H is a hexadecimal digit. Notice that the Xuc property does not
      match these sequences but the characters that they represent.)
    Resetting the Match Start
    The escape sequence \K causes any previously matched characters not to
      be included in the final matched sequence. For example, the following
      pattern matches "foobar", but reports that it has matched "bar":
 
    
    This feature is similar to a lookbehind assertion
      
      
      (described below). However, in this case, the part of the subject before
      the real match does not have to be of fixed length, as lookbehind
      assertions do. The use of \K does not interfere with the setting of
      captured substrings. For example, when the following pattern matches
      "foobar", the first substring is still set to "foo":
    Perl documents that the use of \K within assertions is "not well
      defined". In PCRE, \K is acted upon when it occurs inside positive
      assertions, but is ignored in negative assertions. Note that when a
      pattern such as (?=ab\K) matches, the reported start of the match can
      be greater than the end of the match.
    Simple Assertions
    The final use of backslash is for certain simple assertions. An
      assertion specifies a condition that must be met at a particular point in
      a match, without consuming any characters from the subject string. The
      use of subpatterns for more complicated assertions is described below. The
      following are the backslashed assertions:
    
      - \b
 
- Matches at a word boundary.
 
      - \B
 
- Matches when not at a word boundary.
 
      - \A
 
- Matches at the start of the subject.
 
      - \Z
 
- Matches at the end of the subject, and before a newline
        at the end of the subject.
 
      - \z
 
- Matches only at the end of the subject.
 
      - \G
 
- Matches at the first matching position in the subject.
        
 
    
    Inside a character class, \b has a different meaning; it matches the
      backspace character. If any other of these assertions appears in a
      character class, by default it matches the corresponding literal character
      (for example, \B matches the letter B).
    A word boundary is a position in the subject string where the current
      character and the previous character do not both match \w or \W (that is,
      one matches \w and the other matches \W), or the start or end of the
      string if the first or last character matches \w, respectively. In UTF
      mode, the meanings of \w and \W can be changed by setting option
      ucp. When this is done, it also affects \b and \B. PCRE and Perl do
      not have a separate "start of word" or "end of word" metasequence.
      However, whatever follows \b normally determines which it is. For example,
      the fragment \ba matches "a" at the start of a word.
    The \A, \Z, and \z assertions differ from the traditional circumflex and
      dollar (described in the next section) in that they only ever match at the
      very start and end of the subject string, whatever options are set. Thus,
      they are independent of multiline mode. These three assertions are not
      affected by options notbol or noteol, which affect only the
      behavior of the circumflex and dollar metacharacters. However, if argument
      startoffset of run/3 is
      non-zero, indicating that matching is to start at a point other than the
      beginning of the subject, \A can never match. The difference between \Z
      and \z is that \Z matches before a newline at the end of the string and
      at the very end, while \z matches only at the end.
    The \G assertion is true only when the current matching position is at
      the start point of the match, as specified by argument startoffset
      of run/3. It differs from \A when the value of startoffset
      is non-zero. By calling run/3 multiple times with appropriate
      arguments, you can mimic the Perl option /g, and it is in this
      kind of implementation where \G can be useful.
    Notice, however, that the PCRE interpretation of \G, as the start of the
      current match, is subtly different from Perl, which defines it as the end
      of the previous match. In Perl, these can be different when the previously
      matched string was empty. As PCRE does only one match at a time, it cannot
      reproduce this behavior.
    If all the alternatives of a pattern begin with \G, the expression is
      anchored to the starting match position, and the "anchored" flag is set in
      the compiled regular expression.