Gevel contrib module provides several functions useful for analyzing GIN and GiST indexes. Online version of this document is available [23]http://www.sai.msu.su/~megera/wiki/Gevel Caution: This module was designed for developers of GiST/GIN based indices ! Authors * Oleg Bartunov , Moscow, Moscow University, Russia * Teodor Sigaev , Moscow, Moscow University,Russia License Stable version, included into PostgreSQL distribution, released under BSD license. Development version, available from this site, released under the GNU General Public License, version 2 (June 1991) Downloads Old version of gevel available from [24]http://www.sai.msu.su/~megera/postgres/gist/gevel/ Development version is available from CVS: cvs -d :pserver:anoncvs@sigaev.ru:/root login (just enter an empty password) cvs -d :pserver:anoncvs@sigaev.ru:/root co gevel Installation % cd PGSQLSRC/contrib % tar xzvf gevel.tar.gz % make % make install % make installcheck % psql regression < gevel.sql Examples All examples below require installing contrib/rtree_gist module and already installed gevel module. cd contrib/rtree_gist make make install make installcheck psql regression < gevel.sql Now you have regression database with two tables and indices: regression=# \pset tuples_only 1 Showing only tuples. regression=# \d List of relations Schema | Name | Type | Owner --------+---------+-------+---------- public | boxtmp | table | postgres public | polytmp | table | postgres (2 rows) regression=# \di List of relations Schema | Name | Type | Owner | Table --------+------+-------+----------+--------- public | bix | index | postgres | boxtmp public | pix | index | postgres | polytmp (2 rows) * gist_stat(INDEXNAME) - show some statistics about GiST tree regression=# select gist_stat('pix'); Number of levels: 2 Number of pages: 30 Number of leaf pages: 29 Number of tuples: 3129 Number of leaf tuples: 3100 Total size of tuples: 137676 bytes Total size of leaf tuples: 136400 bytes Total size of index: 245760 bytes * gist_tree(INDEXNAME,MAXLEVEL) - show GiST tree up to MAXLEVEL regression=# select gist_tree('pix',0); 0(l:0) blk: 0 numTuple: 29 free: 6888b(15.63%) Designation (from left to right): * 0 - page number * (l:0) - tree level * blk: 0 - block number * numTuple: 29 - the number of tuples * free: 6888b - free space in bytes * (15.63%) - occupied space in percents gist_tree(INDEXNAME) - show full GiST tree regression=# select gist_tree('pix'); 0(l:0) blk: 0 numTuple: 29 free: 6888b(15.63%) 1(l:1) blk: 13 numTuple: 180 free: 244b(97.01%) 2(l:1) blk: 11 numTuple: 175 free: 464b(94.32%) 3(l:1) blk: 2 numTuple: 101 free: 3720b(54.43%) 4(l:1) blk: 17 numTuple: 111 free: 3280b(59.82%) 5(l:1) blk: 18 numTuple: 101 free: 3720b(54.43%) 6(l:1) blk: 10 numTuple: 98 free: 3852b(52.82%) 7(l:1) blk: 19 numTuple: 111 free: 3280b(59.82%) 8(l:1) blk: 9 numTuple: 97 free: 3896b(52.28%) 9(l:1) blk: 20 numTuple: 104 free: 3588b(56.05%) 10(l:1) blk: 14 numTuple: 96 free: 3940b(51.74%) 11(l:1) blk: 21 numTuple: 106 free: 3500b(57.13%) 12(l:1) blk: 7 numTuple: 103 free: 3632b(55.51%) 13(l:1) blk: 1 numTuple: 101 free: 3720b(54.43%) 14(l:1) blk: 16 numTuple: 97 free: 3896b(52.28%) 15(l:1) blk: 24 numTuple: 103 free: 3632b(55.51%) 16(l:1) blk: 4 numTuple: 98 free: 3852b(52.82%) 17(l:1) blk: 25 numTuple: 98 free: 3852b(52.82%) 18(l:1) blk: 3 numTuple: 97 free: 3896b(52.28%) 19(l:1) blk: 26 numTuple: 96 free: 3940b(51.74%) 20(l:1) blk: 6 numTuple: 103 free: 3632b(55.51%) 21(l:1) blk: 8 numTuple: 162 free: 1036b(87.31%) 22(l:1) blk: 23 numTuple: 94 free: 4028b(50.66%) 23(l:1) blk: 12 numTuple: 82 free: 4556b(44.19%) 24(l:1) blk: 27 numTuple: 105 free: 3544b(56.59%) 25(l:1) blk: 5 numTuple: 90 free: 4204b(48.51%) 26(l:1) blk: 28 numTuple: 100 free: 3764b(53.90%) 27(l:1) blk: 22 numTuple: 101 free: 3720b(54.43%) 28(l:1) blk: 15 numTuple: 95 free: 3984b(51.20%) 29(l:1) blk: 29 numTuple: 95 free: 3984b(51.20%) * gist_print(INDEXNAME) - prints objects stored in GiST tree, works only if objects in index have textual representation (type_out functions should be implemented for given object type). It's known to work with R-tree GiST based index (contrib/rtree_gist). Note, in example below, objects are of type box. # \di List of relations Schema | Name | Type | Owner | Table --------+------+-------+----------+--------- public | bix | index | postgres | boxtmp public | pix | index | postgres | polytmp (2 rows) # \d pix Index "public.pix" Column | Type --------+------ p | box gist, for table "public.polytmp" # select * from gist_print('pix') as t(level int, a box) where level =1; for 8.1 and later this query should looks like # select * from gist_print('pix') as t(level int, valid bool, a box) where level =1; level | valid | a -------+-------+----------------------------- 1 | t | (37357,50073),(34242,357) 1 | t | (43499,49770),(40358,43) 1 | t | (31193,24679),(25047,12410) 1 | t | (31018,12142),(25083,6) 1 | t | (49944,25174),(43471,12802) 1 | t | (12577,49757),(6302,37534) 1 | t | (12528,37333),(6171,24861) 1 | t | (50027,49751),(46817,25462) 1 | t | (46870,49912),(43664,25722) 1 | t | (24855,25574),(12447,19263) 1 | t | (25054,19126),(12403,12796) 1 | t | (32737,49923),(31178,1038) 1 | t | (3184,24465),(15,81) 1 | t | (24951,49983),(12740,44000) 1 | t | (24919,43956),(12617,37901) 1 | t | (40387,49852),(37338,25217) 1 | t | (40325,24963),(37375,491) 1 | t | (24919,12698),(12654,6518) 1 | t | (25002,6338),(12350,51) 1 | t | (49985,12554),(43447,222) 1 | t | (25003,37769),(12552,25573) 1 | t | (34270,49382),(32763,594) 1 | t | (6205,50012),(3,37527) 1 | t | (6163,37358),(120,25034) 1 | t | (12343,24542),(9295,294) 1 | t | (9308,24151),(6234,620) 1 | t | (6230,24629),(3169,108) 1 | t | (31179,50040),(28113,25556) 1 | t | (28048,49694),(25000,25000) (29 rows) * Using Gevel module it's possible to visualize index tree, see for example [25]Rtree Index. * gin_stat(INDEXNAME) prints statistics about each index entry in GIN index (PostgreSQL 8.2+) For each index entry 'key' it returns set of records (key, nrows), where 'nrows' is the estimated number of rows from tables, returned by select. For example, in text search 'nrows' is the number of documents, which contains word 'key'. This statistics may be used ONLY as estimation of true counts, since it doesn't takes into account the visibility information (from heap) and uses approximate counts in case of long posting lists. Vacuum table after table modification to obtain more accurate statistics. Top-5 most frequent words (from 463873) : =# SELECT * FROM gin_stat('gin_idx') as t(word text, ndoc int) order by ndoc desc limit 5; word | ndoc --------+-------- page | 340858 figur | 240366 use | 148022 model | 134442 result | 129010 (5 rows) Time: 520.714 ms Compare exact stats, obtained using ts_stat() function, with estimated counts: =# select a.word, b.ndoc as exact, a.estimation as estimation, round ( (a.estimation-b.ndoc)*100.0/a.estimation,2)||'%' as error from (SELECT * FROM gin_stat('gin_x_idx') as t(word text, estimation int) order by estimation desc limit 5 ) as a, stat b where a.word = b.word; word | exact | estimation | error --------+--------+------------+------- page | 340430 | 340858 | 0.13% figur | 240104 | 240366 | 0.11% use | 147132 | 148022 | 0.60% model | 133444 | 134442 | 0.74% result | 128977 | 129010 | 0.03% (5 rows) Time: 550.562 ms Example for integer arrays: # SELECT * FROM gin_stat('gin_idx') as t(value int, nrow int) where nrow > 250; value | nrow -------+------ 31 | 254 47 | 251 52 | 257 59 | 259 (4 rows)