Recent Posts

Recent Comments

Tags more

Archives

Today

Total

관리 메뉴

새발블로그

Hashing 본문

Computer Science/Data Structure

Hashing

EUG 2023. 8. 9. 16:57

Hashing

an algorithm that uses a function(hash function) to map large data sets(variable length), called keys, to **smailler data sets of a fixed length
Hash table(hash map): A data structure that uses a hash function to efficiently map keys to values
search and retrieval

Complexity

OperationSorted ArrayBalanced BSTHashing

Insertion	O(n)	O(logn)	O(1) avg
Deletion	O(n)	O(logn)	O(1) avg
Retrieval	O(log n)	O(logn)	O(1) avg

Direct Addressing

// a[] is an arrary (the table)
insert (key, data)
	a[key]=data
delete(key)
	a[key]=NULL
fine(key)
	return a [key]

limitations

key must be non-negative integer values
Range of keys needs to be small
Keys must be dense (not many gaps in key values)

Hash Table

key Ideas

Map large integers to smaller integers
Map non-integer keys to integers

operations

//a[] is an array(the table)
//h is a hash function

insert(key, data)
	a[h(key)]=data
delete(key)
	a[h(key)]=NULL
find(key)
	return a[h(key)]

not work for all cases

Collision: Two keys have the same hash value

Hash Functions

Assume: A hash table of size N
Keys: used to identify the data
Hash function: Used to compute a hash value
Hash value(hash code)
- computed from the key using hash function to get a number(0~N-1)
- used as the index(address) of the table entry for the data
- Regarded as the "home address" of a key
Goal: Addresses are different and spread evenly

Good Hash Functions

complexity: O(1)/ Scatter keys evenly/ Less collisions/ Need less slots

Perfect Hash Functions

One-to-one
all keys are known
Applications: Compiler and interpreter search for reserved words; shell interpreter searches for built-in commands

Uniform Hash Functions

Distributes keys uniformly in the hash table

Division method(mod operator)

map into a hash table of m slots
use modulo operator(%) to map an integer to a value between 0 and m-1
(n%m) =remainder of n divided by m, where n and m are positive integers
Hash(k)=k%m
Rule of thumb: pick a prime number, close to power of 2 as key

Multiplication method

Multiply key by a fraction F(O<F<1) //kF
Extract the fractional part
Multiply by m, the hash table size

common example of F: golden ration

Hashing of strings

hash1(str){
	int sum=0
    for each character c in str{
    sum+=c
    //sum up the ASCII values of all characters
    }
    return (sum%H_SIZE)//H_SIZE: hash table size
    }

An improved Hashing

sum=0
for each character c in str{
sum =sum * 41 +c}	//41=prime Number
return sum%m
}

Collision Resolution Techniques

Separate Chaining

linked-list
insert at the beginning (or back) of list
insert (key, data)
a[h(key)], O(1)
find(key)
a[h(key)] O(1+a) on average
delete(key)
a[h(key)],O(1+a) on avreage
a= Number of keys / Capacity
a<= constant=> O(1)

Linear Probing

hash(k)=k mod 7, table-size(m)
Insert
collisionL Scan forward for next empty slot
Delete
wrong result,

Correct "Delete" Opertaion

Occupied
Deleted
Empty

Deletion: Mark the status of the slot as "deleted"
needs a state array of same size as the hash table
Search and Insert

Problem: primary Clustering

cluster: a collection of consecutive occupied slots
primary cluster: Cluster covering home address of key
linear probing can create large primary clusters that increases the running time of operations

probe sequence
probe sequence of this linear probing is
hash(key)
(hash(key)+1)%m
(hash(key)+2)%m
(hash(key)+3)%m

Empty slot: sure to fine it
expanded primary cluster

Improved Linear Probing

hash(key)
(hash(key)+1d)%m
(hash(key)+2d)%m
(hash(key)+3*d)%m

d integer>1 and co-prime to m

Quadratic probing

hash(key) 2
(hash(key) + ) % m = (hash(key) + 1) % m
(hash(key) + ) % m = (hash(key) + 4) % m
(hash(key) + ) % m = (hash(key) + 9) % m
(hash(key)+)%m

secondary clustering
reason: using of same pattern in probing by all keys

Double Hashing

reduce secondary clustering,
second hash function

hash(key)
(hash(key) + 1 hash2(key) ) % m
(hash(key) + 2 hash2(key) ) % m
(hash(key) + 3 * hash2(key) ) % m

'Computer Science > Data Structure' 카테고리의 다른 글

[C언어로 쉽게 풀어쓴 자료구조] 1장 (0)	2023.08.24
Heaps (0)	2023.08.09
Graph (0)	2023.08.09
Binary Search Tree(BST) (0)	2023.08.09
Trees (0)	2023.08.09

'Computer Science/Data Structure' Related Articles

새발블로그

Hashing 본문

Hashing

Hashing

Complexity

Direct Addressing

limitations

Hash Table

key Ideas

Hash Functions

Good Hash Functions

Perfect Hash Functions

Uniform Hash Functions

Division method(mod operator)

Multiplication method

Hashing of strings

An improved Hashing

Collision Resolution Techniques

Separate Chaining

Linear Probing

Correct "Delete" Opertaion

Problem: primary Clustering

Improved Linear Probing

Quadratic probing

Double Hashing

'Computer Science > Data Structure' 카테고리의 다른 글

티스토리툴바