This post covers a reflection on major losses that have occurred – sometime (often many years) after decisions were made “in extremis” (under time and cost pressure in a very difficult / near closure environment). The decisions were made by well meaning decision makers – but with 20/20 hind sight (during the detailed investigations) after major losses occurred – it was obvious how flawed these decisions were.
The points here are based on a synthesis of major losses I’ve investigated over the last 20 years – and I’ve tried to describe the causal or risk pathways that flow from these decisions. The causes and incident types described are grouped into:
- Common pre-cursor events and failed controls;
- Geotechnical failures – which led to major collapses at the affected mining operations;
- Major plant failures – resulting in the loss of lives and major business interruptions, and;
- Growth constraints – arising from decisions made to save money on key, long term items of plant or infrastructure at the operation.
The common threads are:
- Loss of key personnel and the knowledge they held and the techniques for doing tasks they adopted;
- Cost saving which led to removing input from external parties;
- Extending the working life / inspection intervals of components in key items of plant;
- Using different suppliers for key components / processes;
- Varying designs in a working operation, and;
- Failing to apply rigorous change management techniques to the decisions made.
The losses which arose typically happened three to 10 years after the decisions were made – and the mistakes were only visible with the benefit of hind sight. To reflect this – although Peter investigated all of these incidents the actual risk pathways are not identical to those that occurred – and have been obfuscated to protect the good name of the decision makers involved. Any statements made which seem to directly implicate a particular operation or decision maker are unintended.
We can avoid these types of loss – which have tragic consequences. No one wants to be a party to Incident Outcomes where multiple deaths or serious business interruptions occur. There are things that can be done cost effectively during the tough times – and the key elements of a solution are:
- Being on the front foot – analyzing intended changes by considering the risk pathways (using major loss bow ties that plot these) to see which controls are affected and whether we’ve increased the frequency of basic causes;
- Taking a break during the implementation of these decisions and having a third party audit the changes being executed as part of the survival strategy;
- Reflecting on the changes against the key design assumptions / elements of your operation;
- Reviewing your layers of protection against major loss incidents (a slightly different spin on reviewing risk pathways), or;
- Engaging an external party to actively reflect on the changes in the light of your incident history.
Common Causes / Hazards
There are a few pre-cursors to major loss which are found in most major consequence Incidents (and all of the losses considered here). All of these flowed from decisions made in difficult times.
- People related – where key technical personnel:
- Leave the organisation – when they realize that things aren’t looking good and they have the confidence, skill set and ability to land a job in a more stable / better paying operation or company. Often the “best” people leave when they “can see the writing on the wall”;
- Are made redundant – reducing the number of personnel and requiring those that remain to cover more facets of the design / analysis process. This typically requires making changes to analysis method or approach so that the design or technical review processes do not become a rate limiting step for ongoing production at the operation;
- Information / files / documents held by the people who were made redundant or who left are not reviewed by others for an extended period of time and / or get discarded as no longer needed. They are almost never summarized to make them easier to use and most key informational aspects are lost from decision makers’ view;
- Managed work environment related:
- Breaching or not completely following established protocols – mostly around making changes, and;
- Information flows breaking down – either through no or unclear communication or loss of “corporate knowledge” when people in various roles change.
Another common thread in the causes of these major loss outcomes from incidents is that they are slow to become real – and people “normalize” the situation – even though with hind sight it becomes clear that the seeds of disaster were growing during this phase.
IMPORTANT - It is vital to remember that having to make decisions under pressure greatly increases the chance that they will be flawed. A good example of this flows from James Reason's classic text where he notes that 80% of decisions made during an emergency situation are wrong!
Apart from the Common issues - a key pre-cursor cause in major loss Geotechnical incidents is changes in the monitoring regimes to suit the changed resources (loss of personnel). The ways this has occurred has been:
- Loss of expertise on site - with junior or untrained personnel taking on the role of collecting and analyzing data from scientific instruments, and;
- Reducing or stopping the input / review by external Geotech consultants.
The decision made which led up to the major loss varied - and a sample of those implicated in Peter's investigation findings were:
- Changes to back filling strategy - including:
- Reducing the amount of fill placed;
- Changing the mixture of fill - increasing fines percentage and / or reducing cement or binding agent addition, or
- Reducing or stopping the testing of backfill samples;
- Varying the required pillar sizes - with typical approaches including:
- Trimming the barrier pillars to neighboring workings or other boundaries;
- Reducing pillar thicknesses against backfilled stopes or water holding geological structures, or;
- Making changes to pillar arrangements - having them longer, thinner or taller than allowed for in design and geotechnical modeling;
- Modifying support regimes:
- Changing bolting strategies (typically also invoking the common cause issue of less technical review during the change), or;
- Reduction in strength or quantity of applied surface support (which covers reducing mesh grade or shotcrete thickness or changing the support regime of hanging walls or goaf fringe devices), and;
- Changing operating practices, such as:
- Reducing the required barriers or separation distances between operating locations and sources of harm (goaves (gobs) / cave zones / flowing material sources), or;
- Abandonment of new technology that required attention from decision making or technical personnel and a reversion to simpler / more manual methods - such as removing requirement for remote controlled plant or no longer pressing the development of advanced fill systems.
These changes did not appear wrong at the time they were made - but the implications which were realized some years down the track were the worst case outcomes of:
- Complete loss of a mining operation due to a catastrophic inrush of material from a nearby aquifer;
- Multiple fatalities arising from an inrush;
- Multiple fatalities arising from a windblast, and;
- Massive collapse in a mining area leading to a four (4) month hiatus in production (with 10's of people compromised but miraculously uninjured).
BEWARE - The pattern of decision making and approaches that save money in difficult times can become accepted even when conditions change! Only a very small number of people at the operations where these losses were suffered indicated they had a concern about how things were being operated prior to the Major Loss occurring.
Major Plant Catastrophes
Again there are some common Causes seen in the plant involved in major losses. The pre-cursor events around major plant decision are typically:
- Changes to the maintenance strategy - particularly where there are changes in the proof test intervals for key, safety / loss related components. Often this decision is based on aligning the test frequency with the testing frequency of similar items in other areas of the operation, and;
- Stopping or dramatically reducing the frequency of OEM Auditing / Servicing. These activities involve (occasionally expensive) visits from the manufacturer or other external expert on the class of / specific items of plant.
Some of the decision made which were causal in the Major Losses were:
- Extension of overhaul dates - pushing out when items of plant were withdrawn from service for rebuild or major repair;
- Changing to a lower priced provider - either whole items of plant or the components of these items, or;
- Varying operating protocols to suit the lower numbers of personnel available - which led to key manual inspection and testing / confirmation of operation tasks not being conducted.
The types of loss suffered included:
- Ignition of methane from a degraded electrical enclosure;
- Dropping of a conveyance in a hoisting shaft, and;
- Structural collapse of surface elements of the ore / coal handling system.
Chronic Growth Constraints
This type of loss arises when key works are conducted in difficult times.
Typical pre-cursor events relate to changes made to the mine design - impacting:
- Opening size - reducing the size of declines, drifts or other long term items as a cost control measure;
- Changing the gradient of declines or interconnecting drives - making them steeper (and shorter), and;
- Reducing the number of headings to access new mining locations.
Some of the decisions that were made included:
- Selection of lower cost ore / coal handling systems - restricting the spend to meet just the current (which is normally depressed in difficult times) required tonnages;
- Opting for smaller and steeper connections to new mining locations - which saves on development cost, and;
- Reducing or ceasing expenditure on exploration.
Again - all of these decisions seemed reasonable at the time - given the difficult trading conditions - but they led to longer term major issues such as:
- Inability to "ramp up" production when prices rebounded after the slump;
- Difficult and delay inducing upgrades becoming the only option for increasing output (leading to a hiatus in production when prices for ore / coal were higher), and;
- Reducing the over-all mine life as there were no reserves available to capitalize on when the prices improved.
CHALLENGE - Sometimes the right decision - from a business perspective - in difficult times is to just work around rate limiting factors. It can prove more effective in the long run to delay commencing an upgrade rather than to execute a "half baked" improvement - which will have to be re-done at higher cost and business risk than the same job left till a better time.
Thoughts on a Solution
It is a difficult time we are living through at the moment - and the thought of spending time and money is not appealing. Your long term business could be much better if, at the very least, you identify and document the decisions that should be challenged once times improve. The human practice of "normalizing" unacceptable conditions (which arose from having to make the hard calls) could be arrested if a sound risk management suite of controls is implemented. These would include internal and external (third party) actions triggered by an improvement in trading conditions - and aimed at making a pivot away from potentially harmful systems of work that made sense when costs had to be constrained in order to survive.
Some ideas on what could be done as an effective study on these decisions and changes are:
- Conduct a risk based analysis of the changes. This would involve a critical review of the risk pathways (using ORM's library of incident backed bow ties) to identify where longer lived problems could occur. The process will also identify early warning signs that the problems are becoming more imminent, by highlighting lower consequence incident outcomes that act as a "tell" for a major loss pathway;
- Audit the operation - particularly around the area where changes have occurred - against best (or previous) practice - to highlight more completely implications of the decisions made;
- Take an Engineering Science based approach to analyzing the decisions made - particularly where these affect mine design or critical plant maintenance strategies. This approach involves reviewing the decisions made against documented (or back analyzed) design assumptions for the affected locations / plant, identifying where assumptions are impacted and recommending longer term modifications which could be made OR immediate changes needed to prevent a worst case Outcome;
- Using a gap analysis against good (or best) practice models for the areas affected by the decisions taken. This approach will highlight where risk pathways may have been "opened up" and confirm that the layers of protection are robust to handle a potential increase in Incident frequency. This type of analysis is typically done as a desktop study - and can form a "scoping study" for more significant analyses if potential problems are highlighted, and;
- For each of these analyses - engage a third party. It is difficult to see (or admit to) your own shortfalls - so having an independent party (yes - such as Peter at ORM) can be a better idea. Your own team members will find it hard to check their own work and the reports back to senior decision makers are likely to be (unintentionally) flawed.
Feel free to chime in and discuss this post - either by posting a reply below - or by sending an email direct to email@example.com
Alternatively, if you want to find out more about Peter and Operational Risk Mentoring's capabilities - you can check out our About Us page.