Digging Deep With Root Cause Analysis

While root cause analysis’ is a standard expression commonly used to mean incident investigation, the term actually denotes a particular system for investigating

Bob Eckhardt

While Îroot cause analysis’ is a standard expression commonly used to mean incident investigation, the term actually denotes a particular system for investigating incidents. Multiple definitions of root cause analysis agree in describing the process as a more in-depth examination than that implied in the phrase Îincident investigationÌ. Besides, for those enamored of jargon, it sounds impressive!

Oftentimes, ÎfaultÌ and ÎcauseÌ are confused. When a fault-oriented, punitive management style predominates, incident investigations typically seek to assign fault Û and protect supervisors Û by generating reports that document alleged responsibility. Such fault-finding is especially evident when underlying causes for the incidents are ultimately supervisory and/or managerial.

Drawbacks in applying root cause analysis are the time necessary to conduct a proper investigation and the expense it entails, in part because all parties involved are taken away from the manufacturing process. Some commercially available root cause analysis systems contain sophisticated flow charts to guide questioning and a logic sequence to direct investigations. These systems, however, can require training and retraining at each incident, especially when the incidence rate is low and skills are lost due to infrequent opportunities to practice investigations. Consequently, root cause analysis systems usually are not well received by supervisory personnel, who are typically Îtoo busyÌ for the paperwork or unwilling to thoroughly study and execute a complicated process. More accessible are several variations of root cause analysis, as follows:

The shotgun approach

The most commonly used system, this method employs a commonsense evaluation of as many parameters relevant to the situation as can be identified. For example, an operator backed his forklift over an employee’s foot and ankle in a block plant. The corresponding accident report might indicate (a) better communication required between the forklift operator and worker, or (b) forklift operator failed to recognize worker in his immediate surroundings, or (c) worker should have stayed out of the area of the forklift. For a full examination of the incident’s root cause, a number of issues need to be evaluated, as follows:

  1. The forklift operator
    • Why did the forklift operator not know the worker was in the vicinity?

    • Were forklift and safety equipment adequate to enable the operator to know the worker was nearby in order to exercise adequate preventive measures?

      • Rearview mirrors clean and functional?
      • Brakes working correctly?
      • Backup alarm installed and working?
      • Strobe light working?
      • If not, then why was the forklift operated without proper safety equipment?
      • Horn working?
    • Was the operator’s visibility clear? If not, then (1) what prevented clear visibility? and, (2) what could have been done to ensure a clear view?

    • Were excessive speed and/or apathy contributing factors due to excessive routine? If excessive speed was evident, then (1) why was the operator speeding, and was this the individual’s first speeding offense. If not, (2) why was the speed not recognized and controlled by supervisors earlier. Sometimes, supervisors have no control over speed in a production environment where the forklift or rack system is undersized for the system’s demands. Was this a production problem that required excessive speed to keep up with the plant’s manufacturing pace, or was it an operator infraction? If the latter, then why did the supervisor not recognize it earlier as an operator issue and take corrective action? The question is intended not for punitive purposes, but to identify underlying causes as reasons exist for everything we do.

  2. The injured worker
    • Why was the worker in the immediate vicinity of the forklift?

      • Plant design problems?
      • Manufacturing design problems?
      • Worker straying from assigned workplace or taking a shortcut? If so, why?
      • Other?
  3. Other issues:
    • Was area lighting adequate?

    • Was orientation or training inadequate to identify hazards to the pedestrian?

    • Could the incident have involved an intentional act by either party?

    • Was an unusual production mishap or related problem in routine involved in the event?

    • Other factors in the environment or situation?

While it generally works reasonably well, a commonsense ÎshotgunÌ approach may be augmented by the more sophisticated, commercially available investigation flow charts and logic sequences.

The ÎwhyÌ system

Another approach entails asking Îwhy?Ì for each answer generated, i.e., every parameter of the incident is addressed by questioning why such a condition occurred.

  • Question: Why did the forklift run over the pedestrian worker? Answer: Forklift operator did not see the pedestrian worker.
  • Q: Why did the forklift operator not see the pedestrian worker? A: Forklift operator was focusing on moving the load.
  • Q: Why was the forklift operator too busy focusing on moving the load to take the standard safety precaution of looking before backing? A: (At about the third question in the ÎwhyÌ sequence, answers start to get sticky.) The real answer in this hypothetical case is that no one thought it important enough to ensure that the forklift operator checked both directions every time he backed up. As the same traffic pattern is repeated continually throughout the day, it was assumed everyone adjusted their work habits in accordance with the traffic flow.
  • Q: Why did no one think it was important enough to make sure the forklift operator looked in both directions every time he backed up? A: The company safety culture, i.e., level of acceptable risk, did not require inspection or identification by a safety auditor of the situation, and supervisors did not consider that extent of training and oversight necessary.
  • Q: Why does the company safety culture, or level of acceptable risk, not provide for inspection or a safety auditor to identify the situation, and why did supervisors not consider that extent of training and oversight necessary? A: (Real answer) The level of risk was considered acceptable to the company in view of funds required for additional training, etc., and excessive costs due to production delay. The safety manager may have noticed the forklift operator did not look both ways, but choosing his battles, elected to overlook the infraction Û a reflection of the facility’s safety culture as understood by the safety manager.

A primary failing of the ÎwhyÌ system is that it may not identify every cause associated with the incident. That shortcoming is accentuated by the fact that all persons and companies naturally accept some level of risk, as it cannot be completely eliminated. When an accident occurs, therefore, a key consideration is whether the incident constitutes a signal that more costs should be applied to the safety effort to further reduce the risk level.

Sending an advisory to other plants or preparing a safety bulletin on the basis of one example is not productive, since the root cause of the incident is only an indicator of risk acceptance. In other words, an advisory alerting the operator to look both ways before backing up or advising pedestrians to remain clear of forklifts in reverse motion assumes that all risk can be eliminated and fault for the incident lies entirely with the forklift operator and pedestrian. Nonetheless, many safety programs operate in such a manner. Of course, an accident is to be regretted, and no one wants a worker to be injured: then why were steps not taken to prevent the incident? The real answer is cost.

The checklist system

On computerized systems employing a series of trees with drop-down box selections or flow charts to derive answers, a checklist approach is helpful. The flow charts and trees typically are not sufficiently comprehensive to reveal root causes. The single-page checklist is far more useful. Following is a sample checklist that can include many variables:

  • Procedures (Administration)
    • Hiring procedures, includes evaluation of qualified personnel, disqualifying factors for workforce availability, drug and alcohol program
    • Turnover rates, indicative of plant culture stability or dependence on seasonal work. Higher turnover rates, for whatever reason, almost always result in higher incident rates.
  • Processes
    • Plant equipment layout evaluation
    • Process layout and flow evaluation
    • Plant equipment maintenance
    • Plant equipment systems, including provisions for available safety equipment.
    • Ergonomics
  • Training
    • New hire orientation, quality and thoroughness
    • Job and hazard training
  • Communications
    • Differences in language
    • Instructions not clear
    • Misunderstood verbal instructions
  • Intentional acts
    • External sabotage, theft, or vandalism
    • Internal sabotage, theft, or vandalism
    • Assault
  • Cultural factors
    • Errors caused by pressure to expedite the process
    • Issue previously accepted or ignored by supervisors (see above)
    • Absentmindedness. A topic vast enough to warrant extensive study, the causes can range from personal and/or physical problems to relationship issues to lifestyle choices. Absentmindedness is a factor less susceptible to investigation due to the personal nature of the subject matter.

Whatever system for root cause analysis is selected, its function is to provide an investigative procedure that ultimately identifies all causes associated with an incident. Regardless of the system employed Û including the option of a commercially available program Û determining those causes is necessarily followed by a more difficult question: Did the incident occur due to some unwritten and unspoken, perceived level of acceptable risk?

A WEB OF DEFINITIONS

Various definitions of root cause analysis are presented at web sites, as listed below: