새발블로그
Hashing 본문
Hashing
- an algorithm that uses a function(hash function) to map large data sets(variable length), called keys, to **smailler data sets of a fixed length
- Hash table(hash map): A data structure that uses a hash function to efficiently map keys to values
- search and retrieval
Complexity
OperationSorted ArrayBalanced BSTHashing
| Insertion | O(n) | O(logn) | O(1) avg |
| Deletion | O(n) | O(logn) | O(1) avg |
| Retrieval | O(log n) | O(logn) | O(1) avg |
Direct Addressing
// a[] is an arrary (the table)
insert (key, data)
a[key]=data
delete(key)
a[key]=NULL
fine(key)
return a [key]
limitations
- key must be non-negative integer values
- Range of keys needs to be small
- Keys must be dense (not many gaps in key values)
Hash Table
key Ideas
- Map large integers to smaller integers
- Map non-integer keys to integers
operations
//a[] is an array(the table)
//h is a hash function
insert(key, data)
a[h(key)]=data
delete(key)
a[h(key)]=NULL
find(key)
return a[h(key)]
not work for all cases
Collision: Two keys have the same hash value
Hash Functions
- Assume: A hash table of size N
- Keys: used to identify the data
- Hash function: Used to compute a hash value
- Hash value(hash code)
- computed from the key using hash function to get a number(0~N-1)- used as the index(address) of the table entry for the data
- Regarded as the "home address" of a key
- Goal: Addresses are different and spread evenly
Good Hash Functions
- complexity: O(1)/ Scatter keys evenly/ Less collisions/ Need less slots
Perfect Hash Functions
- One-to-one
- all keys are known
- Applications: Compiler and interpreter search for reserved words; shell interpreter searches for built-in commands
Uniform Hash Functions
- Distributes keys uniformly in the hash table

Division method(mod operator)
- map into a hash table of m slots
- use modulo operator(%) to map an integer to a value between 0 and m-1
- (n%m) =remainder of n divided by m, where n and m are positive integers
Hash(k)=k%m
Rule of thumb: pick a prime number, close to power of 2 as key
Multiplication method
- Multiply key by a fraction F(O<F<1) //kF
- Extract the fractional part
- Multiply by m, the hash table size

common example of F: golden ration
Hashing of strings
hash1(str){
int sum=0
for each character c in str{
sum+=c
//sum up the ASCII values of all characters
}
return (sum%H_SIZE)//H_SIZE: hash table size
}
An improved Hashing
sum=0
for each character c in str{
sum =sum * 41 +c} //41=prime Number
return sum%m
}
Collision Resolution Techniques
Separate Chaining

- linked-list
- insert at the beginning (or back) of list
insert (key, data)
a[h(key)], O(1)
find(key)
a[h(key)] O(1+a) on average
delete(key)
a[h(key)],O(1+a) on avreage
a= Number of keys / Capacity
a<= constant=> O(1)
Linear Probing
hash(k)=k mod 7, table-size(m)
Insert
collisionL Scan forward for next empty slot
Delete
wrong result,
Correct "Delete" Opertaion
- Occupied
- Deleted
- Empty
Deletion: Mark the status of the slot as "deleted"
needs a state array of same size as the hash table
Search and Insert
Problem: primary Clustering
cluster: a collection of consecutive occupied slots
primary cluster: Cluster covering home address of key
linear probing can create large primary clusters that increases the running time of operations
probe sequence
probe sequence of this linear probing is
hash(key)
(hash(key)+1)%m
(hash(key)+2)%m
(hash(key)+3)%m
Empty slot: sure to fine it
expanded primary cluster
Improved Linear Probing
hash(key)
(hash(key)+1d)%m
(hash(key)+2d)%m
(hash(key)+3*d)%m
d integer>1 and co-prime to m
Quadratic probing
hash(key) 2
(hash(key) + ) % m = (hash(key) + 1) % m
(hash(key) + ) % m = (hash(key) + 4) % m
(hash(key) + ) % m = (hash(key) + 9) % m
(hash(key)+)%m
secondary clustering
reason: using of same pattern in probing by all keys
Double Hashing
reduce secondary clustering,
second hash function
hash(key)
(hash(key) + 1 hash2(key) ) % m
(hash(key) + 2 hash2(key) ) % m
(hash(key) + 3 * hash2(key) ) % m
'Computer Science > Data Structure' 카테고리의 다른 글
| [C언어로 쉽게 풀어쓴 자료구조] 1장 (0) | 2023.08.24 |
|---|---|
| Heaps (0) | 2023.08.09 |
| Graph (0) | 2023.08.09 |
| Binary Search Tree(BST) (0) | 2023.08.09 |
| Trees (0) | 2023.08.09 |