Friday, November 23, 2007
DDBMS advantages and disadvantages
Distributed Database Management Systems
Evolution of DBMS
Change of business environment
Business operations became more decentralized geographically
Competition increased at the global level
Customer demands and market needs favored a decentralized management style
Microcomputers increased in power which encourage the growth of local area networks as mainframe costs are high
DBMS encourages data sharing
Decentralized Database
Centralized database suffer problems such as:
Performance degradation -- more remote locations over greater distances
High costs associated with mainframe database systems
Reliability problems created by dependence on a central site
DDBMS Advantages
Data are located near the "greatest demand" site
Faster data access locally than remotely located centralized data
Faster data processing by spreading out the work load
Growth facilitation as it enables adding new sites without affecting other sites
Improve communications as sites are more closely located to customers
Reduced operating costs as less costly than to update a mainframe
User friendly interface
Less danger of a single-point failure
Processor independence as any processor can access data at different location
DDBMS Disadvantages
Complexity of management and control
Security is less sophisticated than mainframe
Lack of standards as it is still not mature
Distributed Processing and Distributed Databases
Distributed processing does not necessarily involved DDBMS, but
a DDBMS must operate in a network environment which consists of:
· computer workstations
· network hardware and software (e.g. NIC)
· communication media (e.g. network cable)
· transaction processor (DTM -- distributed transaction manager)
· data processor (LTM -- local transaction manager or local data manager)
Distributed Database Transparency Features
· Distribution transparency -- treats it as a single logical database
· Transaction transparency -- allows a transaction to update data correctly as at a central site
· Failure transparency -- system will continue to operate in the event of a node failure as other nodes can substitute the functions
· Performance transparency -- perform as if it were a centralized DBMS
· Heterogeneity transparency -- support different database models
Distribution transparency
Allows users to manage a physically dispersed database as if it were a centralized database
· Fragmentation transparency -- user need not know that the database is partitioned
· Location transparency -- user must specify which fragment but not where these fragments are
· Local mapping transparency -- must specify both but not the mapping
A Layered Distributed Database Model
Levels of distribution transparency
1. Global Schema -- a global view of database schema similar to centralized database
2. Fragmentation Schema -- a schema to partition the database into logical fragments
3. Allocation Schema -- a schema to determine the allocation of fragments to each site, with or without replication
4. Local Mapping Schema
5. DBMS of site n
6. Local database at site n
Global Schema
definition of all data which are contained in the DDB as in centralized database
model should be convenient for the definition of the mapping to the other levels of the DDB
Fragmentation Schema
To partition the database without regard of physical location of data:
Data is partitioned into mutually exclusive (non-overlapping) fragments (except perhaps the primary key)
mapping between the global relations and fragments
one to many mapping
Fragmentation Rules:
1. Completeness -- all data must be mapped into the fragments
2. Reconstruction condition -- always possible to reconstruct global relation from its fragments
3. Disjointness condition -- nonoverlapping fragments (except for the primary key [why?]) so that replication can be controlled explicitly at the allocation level
Horizontal Fragmentation
1. partitioning the tuples of a global relation into subsets fragment obtained as a selection operation on the global relation qualification is the predicate used in the selection operation defining a fragment
Derived Horizontal Fragmentation is fragmentation based on another relation (e.g. organization chart)
Vertical Fragmentation
1. subdivision of a global attributes into groups (e.g. data in different applications)
2. fragments obtained by projecting the global relation over each group which should consist of the primary key [why?]
Mixed Fragmentation
1. obtained by applying the fragmentation operations recursively provided the correctness conditions are satisfied each time
2. reconstruction can be obtained by applying the construction rules in inverse order
Allocation Schema
defines at which site(s) a fragment is located
DDB is redundant if mapping is 1 to many, non-redundant if 1 to 1
Physical image j is all the fragments of the same global relation R located at the same site j
Replicated copy of a fragment placed in more than one site
Local Mapping Schema
maps the physical images to the objects manipulated by the local DBMS
only one type of local DBMS in a homogeneous system
different types of local DBMS in a heterogeneous system
Distribution Transparency for Read-Only Applications
Fragmentation transparency -- reference to global relation as if the DB is not distributed
Location transparency -- user accesses fragments ignoring where they are
Local Mapping transparency -- refers to objects using names independent from local systems but has to specify sites of the objects, mapping of application into functions used by the local DBMS
No transparency -- application has to code the local DBMS application programs
Distribution Transparency for Update Applications
An update must be performed on all copies of a data without the user involvement
More complex task: a modification may involve relocation of a tuple without the user involvement
Transaction Management in Centralized Systems
Properties of Transactions (the ACID test):
1. Atomicity: a transaction must be all-or-nothing
2. Consistency: a transaction takes the system from one consistenct state to another consistent state
3. Isolation: each transaction must be performed without interference from other transactions, or the intermediate effects of a transaction must not be visible to other transactions
4. Durability: after a transaction has completed successfully all its effects are saved in permanent storage
Reference: http://personal.cityu.edu.hk/~ismikeho/dm2/dmchap9.htm
Concurrency Control
1. The lost update problem – two or more concurrent transactions wiping out the effect of an earlier updates of a transaction
2. The inconsistent retrieval problem – an incorrect retrieval caused by accessing an incomplete transaction information
3. The serial equivalence criterion – an interleaving of the operations of transactions in which the combined effect is the same as if the transactions had been performed one at a time in some order :
a) Two schedules (of operations) are equivalent (serializable) if:
i) Each read operation read data item values which are produced by the same write operations in both schedules
ii) The final write operation on each data item is the same in both schedules
b) Two operations are in conflict if they operate on the same data item, one of them is a write operation, and they are issued by different transactions
Mechanisms to perform concurrency control:
1. 2 Phase Locking –
a) A transaction applies shared-lock if it reads a resource and is successful only if the resource is not exclusive-locked
b) A transaction applied exclusive-lock if it writes/updates a resource and is successful only if the resource is not locked at all
c) A trans. is well-formed if it always locks a data item in shared mode before reading it and always lock a data item in exclusive mode before writing it
d) A transaction has to wait and retry if it cannot acquire the necessary locks.
e) Any locks held until the transaction commits or aborts is called strict two-phase locking
f) A transaction once starts to unlock does not acquire new locks is a 2-phase-locked transaction
2. Time Stamping – a method to ensure the time of the transaction operations follows the above serializability rule by keeping the time of the operation with the data. Inconsistent operations are aborted.
Deadlock is possible with 2-phase-locking
Evolution of DBMS
Change of business environment
Business operations became more decentralized geographically
Competition increased at the global level
Customer demands and market needs favored a decentralized management style
Microcomputers increased in power which encourage the growth of local area networks as mainframe costs are high
DBMS encourages data sharing
Decentralized Database
Centralized database suffer problems such as:
Performance degradation -- more remote locations over greater distances
High costs associated with mainframe database systems
Reliability problems created by dependence on a central site
DDBMS Advantages
Data are located near the "greatest demand" site
Faster data access locally than remotely located centralized data
Faster data processing by spreading out the work load
Growth facilitation as it enables adding new sites without affecting other sites
Improve communications as sites are more closely located to customers
Reduced operating costs as less costly than to update a mainframe
User friendly interface
Less danger of a single-point failure
Processor independence as any processor can access data at different location
DDBMS Disadvantages
Complexity of management and control
Security is less sophisticated than mainframe
Lack of standards as it is still not mature
Distributed Processing and Distributed Databases
Distributed processing does not necessarily involved DDBMS, but
a DDBMS must operate in a network environment which consists of:
· computer workstations
· network hardware and software (e.g. NIC)
· communication media (e.g. network cable)
· transaction processor (DTM -- distributed transaction manager)
· data processor (LTM -- local transaction manager or local data manager)
Distributed Database Transparency Features
· Distribution transparency -- treats it as a single logical database
· Transaction transparency -- allows a transaction to update data correctly as at a central site
· Failure transparency -- system will continue to operate in the event of a node failure as other nodes can substitute the functions
· Performance transparency -- perform as if it were a centralized DBMS
· Heterogeneity transparency -- support different database models
Distribution transparency
Allows users to manage a physically dispersed database as if it were a centralized database
· Fragmentation transparency -- user need not know that the database is partitioned
· Location transparency -- user must specify which fragment but not where these fragments are
· Local mapping transparency -- must specify both but not the mapping
A Layered Distributed Database Model
Levels of distribution transparency
1. Global Schema -- a global view of database schema similar to centralized database
2. Fragmentation Schema -- a schema to partition the database into logical fragments
3. Allocation Schema -- a schema to determine the allocation of fragments to each site, with or without replication
4. Local Mapping Schema
5. DBMS of site n
6. Local database at site n
Global Schema
definition of all data which are contained in the DDB as in centralized database
model should be convenient for the definition of the mapping to the other levels of the DDB
Fragmentation Schema
To partition the database without regard of physical location of data:
Data is partitioned into mutually exclusive (non-overlapping) fragments (except perhaps the primary key)
mapping between the global relations and fragments
one to many mapping
Fragmentation Rules:
1. Completeness -- all data must be mapped into the fragments
2. Reconstruction condition -- always possible to reconstruct global relation from its fragments
3. Disjointness condition -- nonoverlapping fragments (except for the primary key [why?]) so that replication can be controlled explicitly at the allocation level
Horizontal Fragmentation
1. partitioning the tuples of a global relation into subsets fragment obtained as a selection operation on the global relation qualification is the predicate used in the selection operation defining a fragment
Derived Horizontal Fragmentation is fragmentation based on another relation (e.g. organization chart)
Vertical Fragmentation
1. subdivision of a global attributes into groups (e.g. data in different applications)
2. fragments obtained by projecting the global relation over each group which should consist of the primary key [why?]
Mixed Fragmentation
1. obtained by applying the fragmentation operations recursively provided the correctness conditions are satisfied each time
2. reconstruction can be obtained by applying the construction rules in inverse order
Allocation Schema
defines at which site(s) a fragment is located
DDB is redundant if mapping is 1 to many, non-redundant if 1 to 1
Physical image j is all the fragments of the same global relation R located at the same site j
Replicated copy of a fragment placed in more than one site
Local Mapping Schema
maps the physical images to the objects manipulated by the local DBMS
only one type of local DBMS in a homogeneous system
different types of local DBMS in a heterogeneous system
Distribution Transparency for Read-Only Applications
Fragmentation transparency -- reference to global relation as if the DB is not distributed
Location transparency -- user accesses fragments ignoring where they are
Local Mapping transparency -- refers to objects using names independent from local systems but has to specify sites of the objects, mapping of application into functions used by the local DBMS
No transparency -- application has to code the local DBMS application programs
Distribution Transparency for Update Applications
An update must be performed on all copies of a data without the user involvement
More complex task: a modification may involve relocation of a tuple without the user involvement
Transaction Management in Centralized Systems
Properties of Transactions (the ACID test):
1. Atomicity: a transaction must be all-or-nothing
2. Consistency: a transaction takes the system from one consistenct state to another consistent state
3. Isolation: each transaction must be performed without interference from other transactions, or the intermediate effects of a transaction must not be visible to other transactions
4. Durability: after a transaction has completed successfully all its effects are saved in permanent storage
Reference: http://personal.cityu.edu.hk/~ismikeho/dm2/dmchap9.htm
Concurrency Control
1. The lost update problem – two or more concurrent transactions wiping out the effect of an earlier updates of a transaction
2. The inconsistent retrieval problem – an incorrect retrieval caused by accessing an incomplete transaction information
3. The serial equivalence criterion – an interleaving of the operations of transactions in which the combined effect is the same as if the transactions had been performed one at a time in some order :
a) Two schedules (of operations) are equivalent (serializable) if:
i) Each read operation read data item values which are produced by the same write operations in both schedules
ii) The final write operation on each data item is the same in both schedules
b) Two operations are in conflict if they operate on the same data item, one of them is a write operation, and they are issued by different transactions
Mechanisms to perform concurrency control:
1. 2 Phase Locking –
a) A transaction applies shared-lock if it reads a resource and is successful only if the resource is not exclusive-locked
b) A transaction applied exclusive-lock if it writes/updates a resource and is successful only if the resource is not locked at all
c) A trans. is well-formed if it always locks a data item in shared mode before reading it and always lock a data item in exclusive mode before writing it
d) A transaction has to wait and retry if it cannot acquire the necessary locks.
e) Any locks held until the transaction commits or aborts is called strict two-phase locking
f) A transaction once starts to unlock does not acquire new locks is a 2-phase-locked transaction
2. Time Stamping – a method to ensure the time of the transaction operations follows the above serializability rule by keeping the time of the operation with the data. Inconsistent operations are aborted.
Deadlock is possible with 2-phase-locking
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment