2006-03-26

Soft-RAID1 for FreeBSD6

低成本高容错解决方案的基础,RAID1.

从今之后我们不需要再购买昂贵的SCSI RAID阵列了.经过试验这种IDE RAID在BIOS正确设置的情况下(Boot sequence, Halt on no error)可以在任何1块硬盘缺席的情况下正常工作.


1. Install FreeBSD on to ad.
2. Reboot with the Install CD.
3. Enter Fixit mode. (For FreeBSD less than 5.4, use Install CD disc2 as the "live filesystem")
4. # chroot /dist
# mount_devfs devfs /dev
# gmirror load
# gmirror label -v -b round-robin gm0 /dev/ad4
# gmirror insert gm0 /dev/ad6
# mount /dev/mirror/gm0s1a /mnt
# echo 'geom_mirror_load="YES"' >> /mnt/boot/loader.conf
# echo 'swapoff="YES"' >> /mnt/etc/rc.conf
5. Edit /mnt/etc/fstab to convert ad4 -> mirror/gm0 (The ee command may be useful here - press ESC when you are done.)
takhun suggests the following:
# sed "s%ad4%mirror/gm0%" /mnt/etc/fstab > /mnt/etc/fstab.new
# mv /mnt/etc/fstab.new /mnt/etc/fstab
6. Reboot

http://dannyman.toldme.com/2005/01/24/freebsd-howto-gmirror-system/
Change cable of IDE0 and IDE1 install system to another disk.

boredhacker responded on August 12th, 2005 at 8:54 pm

>>>I chose round-robin because I figured if you have two disks in a mirror, they're both under the same "load" constraints, and it is best to KISS.

I can't say that I agree with you on this one. From a user perspective typing 'load' or 'prefer' are both easier to use than 'round-robin'; fewer characters ;-) But seriously, in my desktop machine I have a duplexed gmirror built off of 2 disks that have the same geometry but one is udma66 and the other is udma100, I'm pretty sure using 'load' is better in this situation and probably others too.

Regardless, the how-to is still a wonderful thing. I certainly don't want to take anything away from it. However, a little constructive criticism can be a good thing, so….

If I may suggest, although the how-to tells you how to set-up the mirror, it never mentions what to do when an actual disaster happens (i.e. when a disk needs to be changed).

Some things to consider:

1. Most modern disks will not be the same size even if "you have a pair of identical IDE … drives". This is because the integrated drive electronics account for imperfections in the media (bad tracks/sectors). Thus the command "gmirror label -v -b round-robin gm0 /dev/ad4″ in step 4 should *i think* be more like "gmirror label -v -b round-robin gm0 /dev/ad6 /dev/ad4″. This will create a mirror only as big as the smallest drive. If you don't do this, you just might see a message that says you can't insert ad6 because it's too small (even though the disks are 'identical'). In addition, to quote the gmirror manpage: "The order of components is important, because a component's priority is based on its position (starting from 0). The component with the biggest priority is used by the prefer balance algorithm and is also used as a master component when resynchronization is needed, e.g. after a power failure when the device was open for writing." In light of this, I'm not sure if the command should be "gmirror label -v -b round-robin gm0 /dev/ad6 /dev/ad4″ or "gmirror label -v -b round-robin gm0 /dev/ad4 /dev/ad6″ - but one may be better than the other. I mean… the gmirror manpage does say order is important, right?

2. When installing the system (in step 1) it might not be a bad idea to create a slice that is actually smaller (by say 32-128mb) than the drive allows - or to install the system to the truly smallest drive. This will insure that when/if a drive needs to be replaced you can simply use the same make and model (or a drive with the same physical geometry) and not worry about size discrepencies.

3. Replacing a drive can be simple if you have a new one on hand. If it isn't the boot drive that has failed (or the mirror is already running and you have hot-swap hardware) then it's just a matter of 'gmirror forgeting/removing' the old disk and 'gmirror inserting' the new disk. If you have hot-swappable hardware you shouldn't even have to reboot. If the boot drive failed (and you can't [re]boot) you'll need the fix-it cd as you recommend to 'forget/remove' the drive, and 'insert' another. Use 'gmirror status' to wait until the mirror rebuilds itself before exiting fixit (since it was the boot drive that failed). Booting a partially rebuilt disk simply can't be a good idea imho.

4. The crib sheet assumes you have a pair of identical IDE disk drives, which is a good thing. Assuming otherwise would introduce too much complexity for a simple how-to. However, this just makes the previous three considerations that much more important. After all, what good is having a mirror if you can't rebuild it after a failure?

Now, as you can probably tell, I'm not 100% sure about a lot of this. And I may not have written it out completely or perfectly clear. But hopefully these suggestions can make a good how-to even better!
-----------------------

Before demonstrating the configuration, it is useful to understand a bit about GEOM. GEOM is the modular disk framework introduced in FreeBSD 5.0. This modularity allows the creation of programs to manipulate disks. The best examples are the software RAID programs introduced with FreeBSD 5.3:

* gstripe(8) provides a stripe set or RAID 0
* gmirror(8) provides a mirror/duplex or RAID 1
* graid3(8) provides a stripe with parity or RAID 3

The initial g indicates that each of these programs takes advantage of GEOM.

Note: if you're totally new to RAID, Webopedia has a good definition of each RAID level.

man 4 geom describes the terms it uses to refer to disks, some of which you'll see when setting up gmirror. These terms include the following:

* provider--This GEOM entity appears in /dev. This article shows how to create a provider known as /dev/mirror/gm0, which represents the disk mirror/duplex.
* consumer--This entity receives I/O requests. In the example of a mirror/duplex, it is the two physical drives. I use two IDE drives on separate cables; they are /dev/ad0 and /dev/ad2.
* metadata--When referring to any RAID level, metadata includes the array members, their sizes and locations, descriptions of logical disks and partitions, and the current state of the disk array.
* mirror/duplex--RAID 1 maintains the same data on two separate drives. In other words, it mirrors the data on one drive to another drive. If those two drives are attached to the same IDE cable, they are a mirror; if they are attached to separate cables, they are a duplex. Because a single cable introduces a single point of failure, most mirrors are actually duplexes.

Configuring the Mirror/Duplex During the Install

If you're going to use RAID 1, make your life easy and purchase two identical disks (of the same model and size). You can complicate things by insisting on different disks with different sizes, but in the end you just end up with a harder configuration that wastes the extra disk space on the larger disk. Cable the identical drives so that one is the primary master and the other is the secondary master. Before installing the operating system, double-check that your CMOS recognizes both disks.

Using your favorite installation method, start a FreeBSD install of any version (5.3 or higher). When you get to the Select Drives menu, it should show ad0 and ad2. Select ad0, as you will be installing the operating system on the primary master.

Within the fdisk utility, remove any existing partitions and then select "Use entire disk." When asked about the boot menu, choose "Standard MBR."

In the disklabel editor, set up the partitions on ad0 according to your requirements. If in doubt, choose a for automatic. Then choose your install sets and your install media, and let the operating system install as usual.

When finished, go through the postinstall configurations and set your time zone, create a user account, set the root password, and so on.

However, don't reboot when you end up back at the sysinstall main menu. Instead, press Alt-F4, which will take you to a command prompt. The first command I type is csh so I can get a shell with history (the default shell is Bourne).

Creating a mirror/duplex is as simple as typing:

# gmirror label -v -b round-robin gm0 /dev/ad0

where gmirror label creates the mirror; -v enables verbose mode; -b round-robin chooses a balance algorithm (at the moment, round-robin is the algorithm with the best performance); gm0 is the name of mirror/duplex (this name represents the first GEOM mirror); and /dev/ad0 represents the disk containing the data to mirror.

However, you'll be disappointed if you try the command now:

# gmirror label -v -b round-robin gm0 /dev/ad0
Can't store metadata on /dev/ad0: Operation not permitted

This is a security feature that indicates that the disk is currently mounted for writing and therefore is unavailable. However, you can get around this chicken-and-egg problem and temporarily force gmirror to bypass this measure in order to create the mirror/duplex by setting a sysctl MIB:

# sysctl kern.geom.debugflags=16

kern.geom.debugflags: 0 -> 16

Don't worry; this MIB will return to 0 when you reboot (which I'll have you do in just a few minutes). Try again:

# gmirror label -v -b round-robin gm0 /dev/ad0
Metadata value stored on /dev/ad0

That's it; you now have a RAID 1 system.

It is, however, useful to tell the operating system to load it whenever you boot. This requires edits to two files. The first one is currently empty, so just echo over the required line:

# echo geom_mirror_load="YES" > /boot/loader.conf

However, /etc/fstab is not empty, so I recommend making a backup copy before editing it:

# cp /etc/fstab /etc/fstab.orig
# vi /etc/fstab

Change each ad to a gm, and insert a mirror after /dev . For example, /dev/ad0s1a becomes /dev/mirror/gm0s1a. Unless you've made extra partitions, you'll have ad0s1 devices ending in a, b, d, e, and f and will need to edit each of those lines.

When finished, triple-check your changes to both /etc/fstab and /boot/loader.conf. While it is fixable, it sucks not being able to boot into a new system because of a typo.

Note: some tutorials indicate you also need to add a swapoff option to /etc/rc.conf. This is no longer necessary, and neither is using shutdown -r now instead of reboot.

Once you're sure you don't have any typos, return to Alt-F1 and exit the installation menu after removing your installation media.

GEOM_MIRROR: Device gm0 created (id=2125638583).
GEOM_MIRROR: Device gm0: provider ad0 detected.
GEOM_MIRROR: Device gm0: provider ad0 activated.
GEOM_MIRROR: Device gm0: provider mirror/gm0 launched.
GEOM_MIRROR: Device gm0 already configured.
Mounting root from ufs:/dev/mirror/gm0s1a

and the system will continue to boot. However, if you have a typo in /etc/fstab, the boot will stop at this point and wait for you to type something meaningful. In this example, I forgot to insert mirror when I edited /etc/fstab, meaning /dev/gm0s1a should have been /dev/mirror/gm0s1a so that FreeBSD could find my root filesystem:

Mounting root from ufs:/dev/gm0s1a
setrootbyname failed
ffs_mountroot: can't find rootvp
Root mount failed: 6

Manual root filesystem specification:
: Mount using filesystem
e.g. ufs:da0s1a
? List valid disk boot devices
Abort manual input

mountroot>

Fortunately, that's not as scary as it looks. Start by listing your valid disk boot devices:

mountroot> ?

List of GEOM managed disk devices:
mirror/gm0s1f mirror/gm0s1e mirror/gm0s1d mirror/gm0s1c mirror/gm0s1b
mirror/gm0s1a mirror/gm0s1 ad2s1 mirror/gm0 ad0s1 ad2 acd0 ad0 fd0

If you type in the correct location of the / filesystem, the system will continue to reboot:

mountroot> ufs:/dev/mirror/gm0s1a
Mounting root from /dev/mirror/gm0s1a

After logging in, be sure to edit the offending line in /etc/fstab and try rebooting again. When you can boot up and log in successfully, verify that each partition on the mirror mounted successfully with:

% df -h
Filesystem Size Used Avail Capacity Mounted on
/dev/mirror/gm0s1a 248M 35M 193M 15% /
devfs 1.0K 1.0K 0B 100% /dev
/dev/mirror/gm0s1e 248M 12K 228M 0% /tmp
/dev/mirror/gm0s1f 7.3G 99M 6.7G 1% /usr
/dev/mirror/gm0s1d 248M 196K 228M 0% /var

df won't show your swap partition; you can verify it with:

% swapinfo
Device 1K-blocks Used Avail Capacity
/dev/mirror/gm0s1b 629544 0 629544 0%

Synchronizing the Mirror/Duplex

The only thing left to do is to synchronize the data on both hard drives. This will happen automatically as soon as you issue the command to insert the second drive into the mirror:


# gmirror insert gm0 /dev/ad2
GEOM_MIRROR: Device gm0: provider ad2 detected.
GEOM_MIRROR: Device gm0: rebuilding provider ad2.

To see what's happening:

# gmirror list more
Geom name: gm0
State: DEGRADED
Components: 2
Balance: round-robin
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 2125638583
Providers:
1. Name: mirror/gm0
Mediasize: 10262568448 (9.6G)
Sectorsize: 512
Mode: r6w5e2
Consumers:
1. Name: ad0
Mediasize: 10262568448 (9.6G)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
Priority: 0
Flags: DIRTY
GenID: 0
SyncID: 1
ID: 3986018406
2. Name: ad2
Mediasize: 10262568448 (9.6G)
Sectorsize: 512
Mode: r1w1e1
State: SYNCHRONIZING
Priority: 0
Flags: DIRTY, SYNCHRONIZING
GenID: 0
SyncID: 1
Synchronized: 1%
ID: 1946262342

Note the SYNCHRONIZING on the Flags line. It will take a while for these two drives to synchronize, as it is currently at 1 percent. I've seen times ranging from about 30 minutes for a 10GB drive to about two and a half hours for a 75GB drive. If you're curious, check the progress with:

# gmirror status
Name Status Components
mirror/gm0 DEGRADED ad0
ad2 (2%)

You'll see a status message in bold white text when the synchronization finishes:

GEOM_MIRROR: Device gm0: rebuilding provider ad2 finished.
GEOM_MIRROR: Device gm0: provider ad2 activated.

If you repeat gmirror list, you'll note that the State has changed from DEGRADED to COMPLETE and the Synchronized line is now gone. Don't worry if you see DIRTY on the Flags line, as it simply indicates that the system has written new data to the disk but hasn't mirrored it yet. If you were to wait a few seconds on a quiet disk, you would see the Flags line change to NONE.

For the final test, reboot the system.

This time your startup messages should include:

GEOM_MIRROR: Device gm0 created (id=2125638583).
GEOM_MIRROR: Device gm0: provider ad0 detected.
GEOM_MIRROR: Device gm0: provider ad2 detected.
GEOM_MIRROR: Device gm0: provider ad0 activated.
GEOM_MIRROR: Device gm0: provider ad2 activated.
GEOM_MIRROR: Device gm0: provider mirror/gm0 launched.
Mounting root from ufs:/dev/mirror/gm0s1a

Final Notes

GEOM utilities are works in progress, and the developers constantly add new features and updates to the man pages. It's well worth your while to keep your favorite version of FreeBSD up-to-date using cvsup or to choose a newer release when deciding which version of FreeBSD to install.

If you wish to gather performance statistics on your mirror/duplex, try gstat(8). A good read through gmirror(8) is also in order, especially if you want an overview of the procedure for replacing a failed disk.

Dru Lavigne is an instructor at Marketbridge Technologies in Ottawa and the maintainer of the Open Protocol Resource.



不过请注意一点,世界上没有2块硬盘是相同的,即便是

同一厂商同一批次的硬盘,在真实的mediasize in sectors这一内容上都不近相同.我就是在2块相同80G硬盘中间迷惑了一阵子才发觉这个问题的,请看:

[root@overlord:~/backup]#diskinfo -v /dev/ad0
/dev/ad0
512 # sectorsize
80025280000 # mediasize in bytes (75G)
156299375 # mediasize in sectors
155058 # Cylinders according to firmware.
16 # Heads according to firmware.
63 # Sectors according to firmware.
[root@overlord:~/backup]#diskinfo -v /dev/ad2
/dev/ad2
512 # sectorsize
80026361856 # mediasize in bytes (75G)
156301488 # mediasize in sectors
155061 # Cylinders according to firmware.
16 # Heads according to firmware.
63 # Sectors according to firmware.
经测试,最简单的办法就是以小的为主硬盘.
手工分配分区没有测试成功过 (估计失败的原因是我选择了Use Entire Disk).
[root@overlord:~/backup]#gmirror list
Geom name: gm0
State: COMPLETE
Components: 2
Balance: load
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 2
ID: 612362278
Providers:
1. Name: mirror/gm0
Mediasize: 80025279488 (75G)
Sectorsize: 512
Mode: r6w6e7
Consumers:
1. Name: ad0
Mediasize: 80025280000 (75G)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
Priority: 0
Flags: NONE
GenID: 0
SyncID: 2
ID: 296048994
2. Name: ad2
Mediasize: 80026361856 (75G)
Sectorsize: 512
Mode: r1w1e1
State: ACTIVE
Priority: 0
Flags: NONE
GenID: 0
SyncID: 2
ID: 2252893446

没有评论: