What Are SMART Drives


S.M.A.R.T.

Self-Monitoring Analysis & Reporting Technology

(Self Monitoring Analysis and Reporting Technology) A drive technology that reports its own degradation enabling the operating system to warn the user of potential failure. It was included in EIDE drives with the ATA-3 specification.

Self-Monitoring Analysis and Reporting Technology (S.M.A.R.T.) is an interface between a computer's start-up program or BIOS (basic input/output system) and the computer hard disk. It is a feature of the Enhanced Integrated Drive Electronics (EIDE) technology that controls access to the hard drive. If S.M.A.R.T is enabled when a computer is set up, the BIOS can receive analytical information from the hard drive and determine whether to send the user a warning message about possible future failure of the hard drive.

In an effort to help users avoid data loss, drive manufacturers are now incorporating logic into their drives that acts as an "early warning system" for pending drive problems. This system is called Self-Monitoring Analysis and Reporting Technology or SMART. The hard disk's integrated controller works with various sensors to monitor various aspects of the drive's performance, determines from this information if the drive is behaving normally or not, and makes available status information to software that probes the drive and look at it.

The fundamental principle behind SMART is that many problems with hard disks don't occur suddenly. They result from a slow degradation of various mechanical or electronic components. SMART evolved from a technology developed by IBM called Predictive Failure Analysis or PFA. PFA divides failures into two categories: those that can be predicted and those that cannot. Predictable failures occur slowly over time, and often provide clues to their gradual failing that can be detected.

An example of such a predictable failure is spindle motor bearing burnout: this will often occur over a long time, and can be detected by paying attention to how long the drive takes to spin up or down, by monitoring the temperature of the bearings, or by keeping track of how much current the spindle motor uses. An example of an unpredictable failure would be the burnout of a chip on the hard disk's logic board: often, this will "just happen" one day. Clearly, these sorts of unpredictable failures cannot be planned for.

The drive manufacturer's reliability engineers analyze failed drives and various mechanical and electronic characteristics of the drive to determine various correlations: relationships between predictable failures, and values and trends in various characteristics of the drive that suggest the possibility of slow degradation of the drive. The exact characteristics monitored depend on the particular manufacturer and model. Here are some that are commonly used:

Head Flying Height: A downward trend in flying height will often presage a head crash.

Number of Remapped Sectors: If the drive is remapping many sectors due to internally-detected errors, this can mean the drive is starting to go.

ECC Use and Error Counts: The number of errors encountered by the drive, even if corrected internally, often signal problems developing with the drive. The trend is in some cases more important than the actual count.

Spin-Up Time: Changes in spin-up time can reflect problems with the spindle motor.

Temperature: Increases in drive temperature often signal spindle motor problems.

Data Throughput: Reduction in the transfer rate of the drive can signal various internal problems.

(Some of the quality and reliability features I am describing in this part of the site are in fact used to feed data into the SMART software.)

Using statistical analysis, the "acceptable" values of the various characteristics are programmed into the drive. If the measurements for the various attributes being monitored fall out of the acceptable range, or if the trend in a characteristic is showing an unacceptable decline, an alert condition is written into the drive's SMART status register to warn that a problem with the drive may be occurring.

SMART requires a hard disk that supports the feature and some sort of software to check the status of the drive. All major drive manufacturers now incorporate the SMART feature into their drives, and most newer PC systems and motherboards have BIOS routines that will check the SMART status of the drive. So do operating systems such as Windows 98. If your PC doesn't have built-in SMART support, some utility software (like Norton Utilities and similar packages) can be set up to check the SMART status of drives. This is an important point to remember: the hard disk doesn't generate SMART alerts, it just makes available status information. That status data must be checked regularly for this feature to be of any value.

Clearly, SMART is a useful tool but not one that is foolproof: it can detect some sorts of problems, but others it has no clue about. A good analogy for this feature would be to consider it like the warning lights on the dashboard of your car: something to pay attention to, but not to rely upon. You should not assume that because SMART generated an alert, there is definitely a drive problem, or conversely, that the lack of an alarm means the drive cannot possibly be having a problem. It certainly is no replacement for proper hard disk care and maintenance, or routine and current backups.

If you experience a SMART alert using your drive, you should immediately stop using it and contact your drive manufacturer's technical support department for instructions. Some companies consider a SMART alert sufficient evidence that the drive is bad, and will immediately issue an RMA for its replacement; others require other steps to be performed, such as running diagnostic software on the drive. In no event should you ignore the alert. Sometimes I see people asking others "how they can turn off those annoying SMART messages" on their PCs.

Through the S.M.A.R.T. system, hard disk drives incorporate a suite of advanced diagnostics that monitor the internal operations of a drive and provide an early warning for many types of potential problems. When a potential problem is detected, the drive can be repaired or replaced before any data are lost.

The S.M.A.R.T. system consists of software that resides both on the disk drive and on the host computer. The software on the disk drive allows a disk drive to report data about its activity, such as the number of hours it has been in operation, the number of seek errors that have occurred and been corrected, it monitors the internal performance of the motors, media, heads, and electronics of the drive. The host software determines the overall reliability of the drive by analyzing the drive's internal performance parameters and comparing them to predetermined threshold limits.

S.M.A.R.T. monitors a number of factors that relate to predictable drive failures. There are also unpredictable drive failures, but those we can't really do much about. Predictable failures occur as a result of bearing failure, cracked or broken read/write head, electronics module failure, changes in spin-up rate, etc. There are also factors related to the failure of the read/write surface, such as seek error rate, excessive bad sectors, and reallocated sector count. Most of these are factors that can be monitored. Then, when a threshold level is exceeded, a failure warning is transmitted. Active SMART monitors these parameters, calculates the date of the fault, and warn the user of the impending risk of a data loss and advise the user of appropriate action. S.M.A.R.T. is an industry standard reliability prediction indicator for both IDE/ATA and SCSI drives.

A drive that is S.M.A.R.T. compliant has a series of parameters (attributes) embedded on the disk drive. The data (attribute values) is constantly collected and monitored for variations within vendor specific thresholds. These tests are designed to predict the impending degradation or failure of a drive. Predictable failures are characterized by degradation of an attribute over time, before the disc drive fails. This creates a situation where attributes can be monitored, making it possible for predictive-failure analysis. Many mechanical failures are typically considered predictable, such as the degradation of head-flying height, which would indicate a potential head crash. Certain electronic failures may show degradation before failing, but more commonly, mechanical problems are gradual and predictable.

For instance, oil level is a function, or "attribute" of most cars that can be monitored. When a car's diagnostic system senses that the oil is low, an oil light comes on. In the same manner, S.M.A.R.T. allows notice to start the backup procedure and save the user's data.

Mechanical failures, which are mainly predictable failures, account for 60 percent of drive failure. This number is significant because it demonstrates a great opportunity for reliability-prediction technology. With the emerging technology of S.M.A.R.T., an increasing number of predictable failures will be predicted, and data loss will be avoided.

But remember, that S.M.A.R.T. should be treated as an advisory service, and not a substitute for regularly backing-up your files. Keeping your data safe can only be ensured by making back-up copies on a regular basis. The S.M.A.R.T. features of any device should not be considered a substitute for planning-ahead.

For an in-depth study see:
http://smartlinux.sourceforge.net/smart/index.php
Back To The Top

Bud's CDs FREE MP3s     Alphabetical Index Of Everything In This Site