새발블로그

Hashing 본문

Computer Science/Data Structure

Hashing

EUG 2023. 8. 9. 16:57

Hashing

  • an algorithm that uses a function(hash function) to map large data sets(variable length), called keys, to **smailler data sets of a fixed length
  • Hash table(hash map): A data structure that uses a hash function to efficiently map keys to values
  • search and retrieval

Complexity

OperationSorted ArrayBalanced BSTHashing

Insertion O(n) O(logn) O(1) avg
Deletion O(n) O(logn) O(1) avg
Retrieval O(log n) O(logn) O(1) avg

Direct Addressing

// a[] is an arrary (the table)
insert (key, data)
	a[key]=data
delete(key)
	a[key]=NULL
fine(key)
	return a [key]

limitations

  • key must be non-negative integer values
  • Range of keys needs to be small
  • Keys must be dense (not many gaps in key values)

Hash Table

key Ideas

  • Map large integers to smaller integers
  • Map non-integer keys to integers

operations

//a[] is an array(the table)
//h is a hash function

insert(key, data)
	a[h(key)]=data
delete(key)
	a[h(key)]=NULL
find(key)
	return a[h(key)]

not work for all cases

Collision: Two keys have the same hash value

Hash Functions

  • Assume: A hash table of size N
  • Keys: used to identify the data
  • Hash function: Used to compute a hash value
  • Hash value(hash code)
    - computed from the key using hash function to get a number(0~N-1)
    • used as the index(address) of the table entry for the data
    • Regarded as the "home address" of a key
  • Goal: Addresses are different and spread evenly

Good Hash Functions

  • complexity: O(1)/ Scatter keys evenly/ Less collisions/ Need less slots

Perfect Hash Functions

  • One-to-one
  • all keys are known
  • Applications: Compiler and interpreter search for reserved words; shell interpreter searches for built-in commands

Uniform Hash Functions

  • Distributes keys uniformly in the hash table

Division method(mod operator)

  • map into a hash table of m slots
  • use modulo operator(%) to map an integer to a value between 0 and m-1
  • (n%m) =remainder of n divided by m, where n and m are positive integers
    Hash(k)=k%m
    Rule of thumb: pick a prime number, close to power of 2 as key

Multiplication method

  1. Multiply key by a fraction F(O<F<1) //kF
  2. Extract the fractional part
  3. Multiply by m, the hash table size

    common example of F: golden ration

Hashing of strings

hash1(str){
	int sum=0
    for each character c in str{
    sum+=c
    //sum up the ASCII values of all characters
    }
    return (sum%H_SIZE)//H_SIZE: hash table size
    }
    

An improved Hashing

sum=0
for each character c in str{
sum =sum * 41 +c}	//41=prime Number
return sum%m
}

Collision Resolution Techniques

Separate Chaining

  • linked-list
  • insert at the beginning (or back) of list
    insert (key, data)
    a[h(key)], O(1)
    find(key)
    a[h(key)] O(1+a) on average
    delete(key)
    a[h(key)],O(1+a) on avreage
    a= Number of keys / Capacity
    a<= constant=> O(1)

Linear Probing

hash(k)=k mod 7, table-size(m)
Insert
collisionL Scan forward for next empty slot
Delete
wrong result,

Correct "Delete" Opertaion

  1. Occupied
  2. Deleted
  3. Empty

Deletion: Mark the status of the slot as "deleted"
needs a state array of same size as the hash table
Search and Insert

Problem: primary Clustering

cluster: a collection of consecutive occupied slots
primary cluster: Cluster covering home address of key
linear probing can create large primary clusters that increases the running time of operations

probe sequence
probe sequence of this linear probing is
hash(key)
(hash(key)+1)%m
(hash(key)+2)%m
(hash(key)+3)%m

Empty slot: sure to fine it
expanded primary cluster

Improved Linear Probing

hash(key)
(hash(key)+1d)%m
(hash(key)+2d)%m
(hash(key)+3*d)%m

d integer>1 and co-prime to m

Quadratic probing

hash(key) 2
(hash(key) + ) % m = (hash(key) + 1) % m
(hash(key) + ) % m = (hash(key) + 4) % m
(hash(key) +  ) % m = (hash(key) + 9) % m
(hash(key)+)%m

secondary clustering
reason: using of same pattern in probing by all keys

Double Hashing

reduce secondary clustering,
second hash function

hash(key)
(hash(key) + 1 hash2(key) ) % m
(hash(key) + 2 hash2(key) ) % m
(hash(key) + 3 * hash2(key) ) % m

'Computer Science > Data Structure' 카테고리의 다른 글

[C언어로 쉽게 풀어쓴 자료구조] 1장  (0) 2023.08.24
Heaps  (0) 2023.08.09
Graph  (0) 2023.08.09
Binary Search Tree(BST)  (0) 2023.08.09
Trees  (0) 2023.08.09