=# \d pg_ts_cfg Table "public.pg_ts_cfg" Column | Type | Modifiers ----------+------+----------- ts_name | text | not null prs_name | text | not null locale | text | Indexes: pg_ts_cfg_pkey primary key btree (ts_name)
Tsearch2 config. Locale can be pointed for defining which config is used for current locale.
=# \d pg_ts_dict Table "public.pg_ts_dict" Column | Type | Modifiers -----------------+------+----------- dict_name | text | not null dict_init | oid | dict_initoption | text | dict_lexize | oid | not null dict_comment | text | Indexes: pg_ts_dict_pkey primary key btree (dict_name)
Table for storing dictionaries. Dict_init field store Oid of function that initialize dictionary. Dict_init has one option: text value from dict_initoption and should return internal representation (structure) of dictionary. Structure must be malloced or palloced in TopMemoryContext. Dict_init is called only one times per process. dict_lexize field store Oid of function that lemmatize lexem. Input values: structure of dictionary, pionter to string and it's length. Output: pointer to array of pointers to C-strings. Last pointer in array must be NULL. Returns NULL means that dictionary can't resolve this word, but return void array means that dictionary know input word, but suppose that word is stop-word.
apod=# \d pg_ts_parser Table "public.pg_ts_parser" Column | Type | Modifiers ---------------+------+----------- prs_name | text | not null prs_start | oid | not null prs_nexttoken | oid | not null prs_end | oid | not null prs_headline | oid | not null prs_lextype | oid | not null prs_comment | text | Indexes: pg_ts_parser_pkey primary key btree (prs_name)
Store parser. prs_start store Oid of function that initialize parser, arguments: pointer to string and it's length, returns internal structure of parser. Structure must be malloced or palloced in TopMemoryContext. prs_nexttoken store Oid of function that return next lexem. Input: structure of parser, pointer to pointer of char, pointer to int4. Returns type of lexem, if type is equal to 0 then all is parsed. Returning lexem is stored in last two pointers. prsd_end store Oid of function that finished parse session. Input: structure of parser. prs_headline is generate headline and work on parsed text, stored in HLPRSTEXT structure (ts_cfg.h). Arguments: pointer to HLPRSTEXT, pointer to query, pointer to option (as text pgsql's type) prs_lextype returns array of LexDescr (see wparser.h), describing types of lexem that can be returned by parser.
=# \d pg_ts_cfgmap Table "public.pg_ts_cfgmap" Column | Type | Modifiers -----------+--------+----------- ts_name | text | not null tok_alias | text | not null dict_name | text[] | Indexes: pg_ts_cfgmap_pkey primary key btree (ts_name, tok_alias)
Table for storing info about dictionaries per lexem type.
13.1 2048 bytes for lexems 13.2 ts_vector has limit about 1Mb. Exact value depends on quantity of position information. If there is no any position information, then sum of length of lexem must be less than 1Mb, otherwise, sum of length of and pos. info. Positional information uses 2 bytes per each position and 2 bytes per lexem with pos info. The number of lexems is limited by 4^32, so in practice it's unlimited. 13.3 ts_query: Number of entries (nodes, i.e sum of lexems and operation) is limited: internal representation is in polish notation and position of one operand is pointed by int2, so it's rather soft limit. In any case, low range of limit - 32768 nodes. Notice: ts_query doesn't designed for storing in table and is optimized for speed, not for size. 13.4 Positional information in ts_vector: 13.4.1 Value of position may not be greater than 2^14 (16384), any value greater than this limit will be replaced by 16383. 13.4.2 Only 256 positional info per lexem.
ts_vector: There is one unused byte per lexem in position information (because of alignment) ts_query: There are one byte and one bit per node I don't know, for what purpose this bytes may be used.... Any idea how to use them are welcome !