We'll call the hb9837 part of the short URL the key. For example, you can have the short url point to the longer URL. To illustrate, we'll create an imaginary URL shortening service.Ī URL shortener service, such as "bit.ly" or the late "goo.gl", provides a short random URL that points to a longer URL. Now that you know how Hash indexes work in PostgreSQL, you are ready to see them in action. Starting at PostgreSQL 10 these limitations were resolved, and Hash indexes are no longer discouraged. According to the warning, Hash indexes are not written to the WAL so they cannot be maintained in replicas, and they are not automatically available after a crash, so you need to manually rebuild them. If you read the index type documentation for PostgreSQL 9.6 you'll find a nasty warning about Hash indexes. Hash indexes were discouraged prior to PostgreSQL 10. If you want to learn more about the internals of Hash indexes in PostgreSQL, check out the readme on "Hash Indexing". Luckily, PostgreSQL does all the heavy lifting for you, so you don't have to decide what hash function to use, or how many buckets there are. When a bucket is split, additional storage is allocated to the index. PostgreSQL uses special hash functions that ensure values in a bucket can be split into exactly two buckets. Index SplitĪt some point, the database can decide it needs to split a bucket into two buckets. The overflow pages contain index entries that did not fit in the bucket's primary page. In this case, additional rows are written to overflow pages. 1 2 3 f() 3 Hash Index overflow pagesĪs rows are added to the index, it's possible for a bucket's primary page to fill up. This is why even after the bucket is identified, the database still needs to sift through the hash codes in the bucket and recheck the condition to filter only the matching tuples. This can cause multiple values to end up in the same bucket. It then uses mod(n_buckets) to determine which bucket the tuple should be put in. What PostgreSQL actually does, is to first use a hash function to produce an integer hash code:ĭb=# SELECT hashtext ( 'text' ), hashchar ( 'c' ), hash_array ( array ), jsonb_hash ( '' :: jsonb ), timestamp_hash ( now () :: timestamp ) ───┬──────────── hashtext │ -451854347 hashchar │ 203891234 hash_array │ -325393530 jsonb_hash │ -1784498999 timestamp_hash │ 1082344883 ![]() For example, the hash function mod(3) returned the hash code 2 for the values 2, 5 and 8. You may have noticed that multiple values can map to the same bucket this is called a collision. Once the bucket is identified, PostgreSQL will fetch the tuples referenced in that bucket and match them against your query. It takes the value and applies the hash function to determine which bucket may hold matching tuples. When you query a value using a Hash index, PostgreSQL does the opposite. In the example above using the hash function mod(3), if you insert the value 5 the index entry will be added to bucket 2, because 5 % 3 = 2. When a new value is added to the index, PostgreSQL applies the hash function to the value and puts the hash code and a pointer to the tuple in the appropriate bucket. For example, to divide values across 3 buckets you can use the hash function mod(3):ĭb=# SELECT n, mod ( n, 3 ) AS bucket FROM generate_series ( 1, 10 ) AS n n │ bucket ────┼──────── 1 │ 1 2 │ 2 3 │ 0 4 │ 1 5 │ 2 6 │ 0 7 │ 1 8 │ 2 9 │ 0 10 │ 1 1 2 3 f() Hash IndexĪ simple hash function for an integer type is modulo: divide a number by another number, and the remainder is the hash code. The buckets map the hash codes to the actual table rows. The hash codes are divided to a limited number of buckets. A good hash function can be computed quickly and "jumbles" the input uniformly across its entire range. PostgreSQL's hash function maps any database value to a 32-bit integer, the hash code (about 4 billion possible hash codes). ![]() To understand how you can benefit from Hash indexes, it's best to understand how they work. For example, a Dict in Python, a HashMap in Java or the new Map type in JavaScript. ![]() Hash table is a common data structure in many programming languages. Just like the name suggests, Hash indexes in PostgreSQL use a form of the hash table data structure.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |