Gevel contrib module provides several functions useful for analyzing GIN and GiST indexes.
Online version of this document is available http://www.sai.msu.su/~megera/wiki/Gevel
Caution: This module was designed for developers of GiST/GIN based indices !
Stable version, included into PostgreSQL distribution, released under BSD license. Development version, available from this site, released under the GNU General Public License, version 2 (June 1991)
Old version of gevel is available from http://www.sai.msu.su/~megera/postgres/gist/gevel/
Development version is available from CVS:
git clone git://sigaev.ru/gevel http://www.sigaev.ru/git/gitweb.cgi?p=gevel.git;a=summary
% cd PGSQLSRC/contrib % tar xzvf gevel.tar.gz % make % make install % make installcheck % psql regression < gevel.sql
All examples below require installing contrib/rtree_gist module and already installed gevel module.
cd contrib/rtree_gist make make install make installcheck psql regression < gevel.sql
Now you have regression database with two tables and indices:
regression=# \pset tuples_only 1 Showing only tuples. regression=# \d List of relations Schema | Name | Type | Owner --------+---------+-------+---------- public | boxtmp | table | postgres public | polytmp | table | postgres (2 rows) regression=# \di List of relations Schema | Name | Type | Owner | Table --------+------+-------+----------+--------- public | bix | index | postgres | boxtmp public | pix | index | postgres | polytmp (2 rows)
regression=# select gist_stat('pix'); Number of levels: 2 Number of pages: 30 Number of leaf pages: 29 Number of tuples: 3129 Number of leaf tuples: 3100 Total size of tuples: 137676 bytes Total size of leaf tuples: 136400 bytes Total size of index: 245760 bytes
regression=# select gist_tree('pix',0); 0(l:0) blk: 0 numTuple: 29 free: 6888b(15.63%)
Designation (from left to right):
regression=# select gist_tree('pix'); 0(l:0) blk: 0 numTuple: 29 free: 6888b(15.63%) 1(l:1) blk: 13 numTuple: 180 free: 244b(97.01%) 2(l:1) blk: 11 numTuple: 175 free: 464b(94.32%) 3(l:1) blk: 2 numTuple: 101 free: 3720b(54.43%) 4(l:1) blk: 17 numTuple: 111 free: 3280b(59.82%) 5(l:1) blk: 18 numTuple: 101 free: 3720b(54.43%) 6(l:1) blk: 10 numTuple: 98 free: 3852b(52.82%) 7(l:1) blk: 19 numTuple: 111 free: 3280b(59.82%) 8(l:1) blk: 9 numTuple: 97 free: 3896b(52.28%) 9(l:1) blk: 20 numTuple: 104 free: 3588b(56.05%) 10(l:1) blk: 14 numTuple: 96 free: 3940b(51.74%) 11(l:1) blk: 21 numTuple: 106 free: 3500b(57.13%) 12(l:1) blk: 7 numTuple: 103 free: 3632b(55.51%) 13(l:1) blk: 1 numTuple: 101 free: 3720b(54.43%) 14(l:1) blk: 16 numTuple: 97 free: 3896b(52.28%) 15(l:1) blk: 24 numTuple: 103 free: 3632b(55.51%) 16(l:1) blk: 4 numTuple: 98 free: 3852b(52.82%) 17(l:1) blk: 25 numTuple: 98 free: 3852b(52.82%) 18(l:1) blk: 3 numTuple: 97 free: 3896b(52.28%) 19(l:1) blk: 26 numTuple: 96 free: 3940b(51.74%) 20(l:1) blk: 6 numTuple: 103 free: 3632b(55.51%) 21(l:1) blk: 8 numTuple: 162 free: 1036b(87.31%) 22(l:1) blk: 23 numTuple: 94 free: 4028b(50.66%) 23(l:1) blk: 12 numTuple: 82 free: 4556b(44.19%) 24(l:1) blk: 27 numTuple: 105 free: 3544b(56.59%) 25(l:1) blk: 5 numTuple: 90 free: 4204b(48.51%) 26(l:1) blk: 28 numTuple: 100 free: 3764b(53.90%) 27(l:1) blk: 22 numTuple: 101 free: 3720b(54.43%) 28(l:1) blk: 15 numTuple: 95 free: 3984b(51.20%) 29(l:1) blk: 29 numTuple: 95 free: 3984b(51.20%)
# \di List of relations Schema | Name | Type | Owner | Table --------+------+-------+----------+--------- public | bix | index | postgres | boxtmp public | pix | index | postgres | polytmp (2 rows) # \d pix Index "public.pix" Column | Type --------+------ p | box gist, for table "public.polytmp" # select * from gist_print('pix') as t(level int, a box) where level =1;
for 8.1 and later this query should looks like
# select * from gist_print('pix') as t(level int, valid bool, a box) where level =1;
level | valid | a -------+-------+----------------------------- 1 | t | (37357,50073),(34242,357) 1 | t | (43499,49770),(40358,43) 1 | t | (31193,24679),(25047,12410) 1 | t | (31018,12142),(25083,6) 1 | t | (49944,25174),(43471,12802) 1 | t | (12577,49757),(6302,37534) 1 | t | (12528,37333),(6171,24861) 1 | t | (50027,49751),(46817,25462) 1 | t | (46870,49912),(43664,25722) 1 | t | (24855,25574),(12447,19263) 1 | t | (25054,19126),(12403,12796) 1 | t | (32737,49923),(31178,1038) 1 | t | (3184,24465),(15,81) 1 | t | (24951,49983),(12740,44000) 1 | t | (24919,43956),(12617,37901) 1 | t | (40387,49852),(37338,25217) 1 | t | (40325,24963),(37375,491) 1 | t | (24919,12698),(12654,6518) 1 | t | (25002,6338),(12350,51) 1 | t | (49985,12554),(43447,222) 1 | t | (25003,37769),(12552,25573) 1 | t | (34270,49382),(32763,594) 1 | t | (6205,50012),(3,37527) 1 | t | (6163,37358),(120,25034) 1 | t | (12343,24542),(9295,294) 1 | t | (9308,24151),(6234,620) 1 | t | (6230,24629),(3169,108) 1 | t | (31179,50040),(28113,25556) 1 | t | (28048,49694),(25000,25000) (29 rows)
For each index entry 'key' it returns set of records (key, nrows), where 'nrows' is the estimated number of rows from tables, returned by select. For example, in text search 'nrows' is the number of documents, which contains word 'key'. This statistics may be used ONLY as estimation of true counts, since it doesn't takes into account the visibility information (from heap) and uses approximate counts in case of long posting lists.
Vacuum table after table modification to obtain more accurate statistics.
Top-5 most frequent words (from 463873) :
=# SELECT * FROM gin_stat('gin_idx') as t(word text, ndoc int) order by ndoc desc limit 5; word | ndoc --------+-------- page | 340858 figur | 240366 use | 148022 model | 134442 result | 129010 (5 rows) Time: 520.714 ms
Compare exact stats, obtained using ts_stat() function, with estimated counts:
=# select a.word, b.ndoc as exact, a.estimation as estimation, round ( (a.estimation-b.ndoc)*100.0/a.estimation,2)||'%' as error from (SELECT * FROM gin_stat('gin_x_idx') as t(word text, estimation int) order by estimation desc limit 5 ) as a, stat b where a.word = b.word; word | exact | estimation | error --------+--------+------------+------- page | 340430 | 340858 | 0.13% figur | 240104 | 240366 | 0.11% use | 147132 | 148022 | 0.60% model | 133444 | 134442 | 0.74% result | 128977 | 129010 | 0.03% (5 rows) Time: 550.562 ms
Example for integer arrays:
# SELECT * FROM gin_stat('gin_idx') as t(value int, nrow int) where nrow > 250; value | nrow -------+------ 31 | 254 47 | 251 52 | 257 59 | 259 (4 rows)