Skip to content

MongoDB request DSLopensavvy.ktmongo.dsl.aggregation.operatorsStringValueOperatorslengthUTF8

lengthUTF8

Returns the number of UTF-8 encoded bytes in the specified string.

If the argument resolves to null, this function returns null.

Counting characters

This function uses MongoDB's $strLenBytes operator, which counts characters using UTF-8 encoded bytes where each code point, or character, may use between one and four bytes to encode. This differs from the length property which uses Unicode code points.

For example, US-ASCII characters are encoded using one byte. Characters with diacritic markings and additional Latin alphabetical characters are encoded using two bytes. Chinese, Japanese and Korean characters typically require three bytes, and other planes of Unicode (emoji, mathematical symbols, etc.) require four bytes.

Example

class Document(
    val text: String,
    val byteLength: Int,
)

collection.aggregate()
    .set {
        Document::byteLength set of(Document::text).lengthUTF8
    }.toList()

External resources

See also