From: Markus Triska Date: Sun, 6 Mar 2022 10:05:15 +0000 (+0100) Subject: DOC: Better explanation of strings and partial strings. X-Git-Tag: v0.9.1~132^2 X-Git-Url: https://git.sagredo.dev/?a=commitdiff_plain;h=8ad4f188f29a664b713249ce5e87443378eea28d;p=scryer-prolog.git DOC: Better explanation of strings and partial strings. --- diff --git a/README.md b/README.md index f2135168..5f9ca909 100644 --- a/README.md +++ b/README.md @@ -280,9 +280,30 @@ in any clause of a predicate's definition. ### Strings and partial strings +A very compact internal representation of *strings* is one of the key +innovations of Scryer Prolog. This means that terms which appear as +lists of characters to Prolog programs are stored in packed +UTF-8 encoding by the engine. + +Without this innovation, storing a list of characters in memory +would use one memory cell per character, one memory cell per +list constructor, and one memory cell for each tail that occurs +in the list. Since one memory cell takes 8 bytes on 64-bit +machines, the packed representation used by Scryer Prolog yields +an up to **24-fold reduction** of memory usage, and +corresponding reduction of memory accesses when creating and +processing strings. + +Scryer Prolog's compact internal string representation makes it +ideally suited for the use case Prolog was originally developed for: +efficient and convenient text processing, especially with definite +clause grammars (DCGs) as provided by +[`library(dcgs)`](src/lib/dcgs.pl) and +[`library(pio)`](src/lib/pio.pl) to transparently apply DCGs to files. + In Scryer Prolog, the default value of the Prolog flag `double_quotes` is `chars`, which is also the recommended setting. This means that -double-quoted strings are interpreted as lists of *characters*, in the +lists of characters can be written as double-quoted strings, in the tradition of Marseille Prolog. For example, the following query succeeds: @@ -292,15 +313,9 @@ For example, the following query succeeds: true. ``` -Internally, strings are represented very compactly in packed -UTF-8 encoding. A naive representation of strings as lists of -characters would use one memory cell per character, one -memory cell per list constructor, and one memory cell for -each tail that occurs in the list. Since one memory cell takes -8 bytes on 64-bit machines, the packed representation used by -Scryer Prolog yields an up to **24-fold reduction** of -memory usage, and corresponding reduction of memory accesses when -creating and processing strings. +This shows that the string `"abc"`, which is represented as a sequence +of 3 bytes internally, appears to Prolog programs as a list of +characters. Scryer Prolog uses the same efficient encoding for *partial* strings, which appear to Prolog code as partial lists of characters. The @@ -323,13 +338,11 @@ the above example, posting Ls0 = [a,b,c|Ls] yields the exact same internal representation, and has the advantage that only the standard predicate `(=)/2` is used. -Definite clause grammars as provided by -[`library(dcgs)`](src/lib/dcgs.pl), and the predicates from -[`library(lists)`](src/lib/lists.pl), are ideally suited for reasoning -about strings. - -Partial strings were first proposed by Ulrich Neumerkel in issue -[#95](https://github.com/mthom/scryer-prolog/issues/95). +The efficient internal representation of strings and partial strings +was first proposed and explained by Ulrich Neumerkel in +issues [#24](https://github.com/mthom/scryer-prolog/issues/24) +and [#95](https://github.com/mthom/scryer-prolog/issues/95), and +Scryer Prolog is the first Prolog system that implements it. ### Occurs check and cyclic terms