MongoDB request DSL • opensavvy.ktmongo.dsl.aggregation.operators • StringValueOperators • substringUTF8

substringUTF8¶

open fun <Context : Any> Value<Context, String?>.substringUTF8(startIndex: Value<Context, Int>, byteCount: Value<Context, Int>): Value<Context, String?>

Returns the substring of a string.

The substring starts with the character at the specified UTF-8 byte startIndex in the string and continues for the byteCount number of bytes.

Note that this behavior is different from String.substring, which expects start and end indexes.

Counting characters¶

This function uses MongoDB's $substrBytes operator, which counts characters using UTF-8 encoded bytes where each code point, or character, may use between one and four bytes to encode. This differs from the substring function which uses Unicode code points.

For example, US-ASCII characters are encoded using one byte. Characters with diacritic markings and additional Latin alphabetical characters are encoded using two bytes. Chinese, Japanese and Korean characters typically require three bytes, and other planes of Unicode (emoji, mathematical symbols, etc.) require four bytes.

If startIndex or byteCount happen to be within a multibyte character, an error will be thrown.

Example¶

class Document(
    val text: String,
)

collection.aggregate()
    .set {
        Document::text set of(Document::text).substringUTF8(startIndex = of(1), byteCount = of(2))
    }.toList()

External resources¶

Official documentation

Counting characters¶

This function uses MongoDB's $substrBytes operator, which counts characters using UTF-8 encoded bytes where each code point, or character, may use between one and four bytes to encode. This differs from the substring function which uses Unicode code points.

For example, US-ASCII characters are encoded using one byte. Characters with diacritic markings and additional Latin alphabetical characters are encoded using two bytes. Chinese, Japanese and Korean characters typically require three bytes, and other planes of Unicode (emoji, mathematical symbols, etc.) require four bytes.

If the start or end index happens to be within a multibyte character, an error will be thrown.

Example¶

class Document(
    val text: String,
)

collection.aggregate()
    .set {
        Document::text set of(Document::text).substringUTF8(1..2)
    }.toList()

External resources¶

Official documentation

substringUTF8¶

Counting characters¶

Example¶

External resources¶

See also¶

Counting characters¶

Example¶

External resources¶

See also¶