Wednesday, December 8, 2010

DRBD - Distributed Replicated Block Device


DRBD stands for Distributed Replicated Block Device which actually means nothing shared but replicated in simple terms. Block devices may be hard disks, partitions or logical drives. DRBD is a part of the Lisog open source stack initiative which is an German not for profit organization which was found to each other.


DRBD is a distributed storage system for GNU/Linux platform and it is consist of
    • Kernal module
    • User space management applications
    • Shell scripts
Normally DRBD is used in high availability clusters and very much similar to RAID1 set up, except that DRBD runs over a network.

Raid 1 has following features
    • Creates an exact copy or mirror of a set of data on two or more disks
    • Good when read performance and reliability are more important than data storage capacity
    • Really good for reliability. E.g. Two idential disks drive with 5% probability that a disk would fail then the the total probability of system failing is (0.05)^2 = 0.0025, that is 0.25%.

What DRBD does
Mirroring
Works on the top of the block device, i.e hard disk partitions or virtual machines logical volumes
Mirror each and every data block wirtten to a disk
DRBD mirrors data
  • In real time - Continuous replication
  • Transparently - With applications knowing that data are stored more then one computer
  • Synchronously or asynchronously
    • Synchronous - Writing would be completed after all the replications are completed
    • Asynchronous - Writing would be completed once the local data store is updated, but before the peers are updated


Data Accessability
File accessability in the file system

Feature list
  • May be used to add redundancy to existing deployments
  • Fully synchronous, memory synchronous or asynchronous modes of operation
  • Masking of local IO errors
  • Shared secret to authenticate the peer upon connect
  • Bandwidth of background resynchronization tunable
  • Automatic recovery after node, network, or disk failures
  • Efficient resynchronization, only blocks that were modified during the outage of a node.
  • Short resynchronization time after the crash of an active node, independent of the device size.
  • Automatic detection of the most up-to-date data after complete failure
  • Integration scripts for use with Heartbeat
  • Dual primary support for use with GFS/OCFS2
  • Configurable handler scripts for various DRBD events
  • Online data verification
  • Optional data digests to verify the data transfer over the network
  • Integration scripts for use with Xen
  • Usable on LVM's logical volumes. Usable as physical volume for LVM
  • Integration scripts for LVM to automatically take a snapshot before a node becomes the target of a resynchronization
  • Dependencies to serialize resynchronization, in case of default all devices in parallel
  • Heartbeat integration to outdate peers with broken replication links, avoids switchovers to stale data
  • Many tuning parameters allow to optimize DRBD for specific machines, networking hardware, and storage subsystem
  • Integration scripts for use with RedHat Cluster
  • Existing file systems can be integrated into new DRBD setups without the need of copying
  • Support for a third, off-site node for disaster recovery
  • Support for compression of the bitmap exchange
  • Support for floating peers in drbdadm
  • Feature complete OCF Heartbeat/Pacemeker resource agent
  • Resource level fencing script, using Pacemaker's constraints
  • Supports TCP/IP over Ethernet, SuperSockets over Dolphin NICs and SDP over Infiniband

No comments:

Post a Comment