Human interface with safety instrumented systems in the chemical , petrochemical , oil and gas industries

Industrial processes became safer with automation. However, when these systems fail, it can cause major accidents. Thus, it is fundamental for these systems to be installed and maintained to avoid failures that can be transformed into accidents. Random and systematic failures are failures of automation systems. The first are predictable, however the second are caused by human factors, hence, it is impossible to foresee them. Thus, it becomes necessary to have good human factor management to avoid these dangerous failures. This paper briefly discusses human errors and comes up with a management to avoid systematic failures, taking in consideration the types and behavior of the errors. Applying this management, it will reduce human errors and, therefore, systematic failures. These actions will enhance the safety of industrial processes.


INTRODUCTION
Process incidents are strongly undesirable at chemical, petrochemical, oil and gas industries.In order to prevent these incidents it is necessary to implement layers of protection to reduce the frequency or minimize the consequence of the accidental events.These actions will contribute to decrease the possibilities of incidents, because the risk is function of frequency and consequence.
One of these layers of protection to reduce the risk is called safety instrumented function (SIF).The set of SIF is named safety instrumented system (SIS) which is considered a critical system to maintain the process in safe mode.The lower the probability of failure on demand (PFD), the greater will be the risk reduction and the process will become safer.Thus, lower PFD means layer of protection stronger.
When a safety instrumented function fails and it is the last defense line, the accident will happen.The failures of these systems arise due to two reasons: random hardware failures and systematic failures.The random hardware failures are predictable and they are related with defects of components of the system.However, the systematic failures act in different way: even the components being in good conditions, the system will not be able to perform its safety function.In addition, they can't be predicted, because they occur due to human errors either by omission or commission (IEC 61511, 2004;ISA-TR84.00.04, 2005;ISA-TR84.00.02, 2002).
The failure of safety instrumented system can cause major incidents such as occurred in Macondo Well in 2010 (SMITH et al., 2013) where resulted in 11 fatalities, injuring 17 others and caused the largest non-intentional oil spill in history.Thence, the attention to both kinds of failures is very important to provide the suitable maintenance of the safety instrumented systems.The concern about maintenance errors goes beyond industrial sector as mentioned by Amura et al. (2014).They point out an aviation incident due to human error induced by maintenance activity.As the systematic failures can't be predicted, it urges a special attention on the whole SIF lifecycle to minimize the possibility of such failures occur.Due to the importance of this issue, this paper will perform a theoretical approach and it will discuss the interference of the human errors into systematic failures.It will also present a proposal to manage these errors in order to reduce the failures and make the industrial processes safer.

PROBLEM STATEMENT
In chemical, petrochemical, oil and gas industries some accidents occur due to human errors.Safety instrumented system (SIS) is a strong barrier do prevent accident, because automatically it takes action to put the process in a safe mode when any deviation is detected.However, the human remains to be important in this context since SIS depends on a good design, installation and maintenance.If there is a flaw in these parts of SIS lifecycle it is sure the system will not work when demanded and a major process accident can occur if there is no other barrier to intercept the chain of event.
For the importance of SIS in the industrial environment, it arises the following question: How to reduce the possibility of human error to the SIS lifecycle?The main purpose of this paper is to respond the question discussing the several types of human errors and, in additional, what step of SIS lifecycle is necessary to have attention to minimize the systematic failures and what kind of action has to be implemented to reduce the possibility of SIS failures.

DISCUSSION OF SOME CONCEPTS MENTIONED INTO THIS PAPER
Safety instrumented system (SIS) is composed by a set of safety instrument function (SIF) that has the main role to protect people, environment and property when takes action to interrupt a process to avoid an accident.Each SIF holds the safe integrity level (SIL) that means how safe is the device to prevent an accident.The higher the SIL, safer will be the industrial process.The SIL is represented by probability of failure in demand (PFD) that implies how is expected the device will fail.The lower the PFD, more reliable will be the device and, hence, the process will be safer as well.
In turn, the failures of the SIS may arise in two different ways: random and systematic failures.Random failures will occur in random time caused by several types of hardware degradation mechanism.Due to this characteristic, these kinds of failures are predictable.
For the other side, systematic failures are unpredictable because the human interference is the main cause.In several parts of SIS lifecycle are prone to occur a systematic failures, for example: in safety requirements specification activity; the design, manufacturing, installation and operation of the hardware; in software programming, etc.
In industrial sector when dealing with dangerous materials, such as, toxic, flammable and explosive products it is essential to implement some layers of protection to prevent an accident.These layers will work to interrupt a chain of event, started by initiating event.For example, if during a reaction occurs a runaway reaction, and the temperature and pressure start to increase, the temperature and pressure controls will be the first layer of protection.If the controls are not enough to put the reaction in normal condition, the second layer of protection will be a SIS that will stop the reaction.Even with all these intervention are not sufficient to prevent the abnormal situation, the third layer of protection will be the relief valve that will open itself in order to reduce the pressure to protect the reactor from an explosion.These actions are derived from layers of protection, nevertheless, the layers have to be independent each other to guarantee they will work even if one of them fails; thus, they are called independent protection layer (IPL).

HUMAN ERRORS
"Errors are deviations from external reality, accuracy or correctness" (TULBURE, 2012).The author also argues there are two distinct models of errors: pessimistic and optimistic models.The pessimistic model considers that any deviation from accuracy is abnormal situation and has to be avoided.On the other hand, optimistic model reminds that not all errors are unacceptable, because may provide some learning.It is sure that some new discoveries come from an error.Nevertheless, into the industrial sector the pessimistic model prevails due to the possibility of disastrous outcomes originated by human error.Leveson (2011) defines the unintentional human errors in two categories: slips or mistakes.The slips are errors that arise from a right intention; however the human makes the wrong action.For the other side, the mistakes are errors that occur on the action planning, in other words, they start from wrong intention.For instance, in case of slips the human decides to stop the pump A, but when he takes the action he stops the pump B. In turn, in the mistakes errors the human decides to stop the pump B (starts from planned action), nevertheless it would be the pump A that should be stopped.
A HSE (2005), HSE (2009) and CCPS (1994) also define the unintentional errors as slips or mistakes.These organizations mention that slips occur during familiar tasks execution, such as, forget to do relevant actions during maintenance activities, calibration, proof test, etc.The mistakes arise when the behavior is based on remind rules, or familiar procedures, or familiar situations where it is needed to take decisions based on knowledge and judgment.Reason (1990) adds the lapses as a variant of the slips because these errors can occur unnoticed by the person that perpetrates them, since they are related to failure of memory.
Additionally, there are the intentional errors which are also important into human errors context.Although they are done by the person's own will, they are rarely made deliberately (HSE, 2005), excepted the sabotage.These errors are related to failures to comply with procedures, inappropriate shortcuts, etc.The person's intention is to perform the work, independent of the consequences, once the person does not believe what is written, or desire to accelerate what is being done to get recognition.There are several causes that lead to these situations: procedures are not reviewed according to changes occurred through of the time and they became discredited; lack of employees' involvement during elaboration of the procedures and, thereupon, they believe there is other better way to perform the task; very large and complex procedures that become difficult to execute them; and lack of operating discipline (API 770, 2001;CCPS, 1994;BELL;SWAIN, 1983;SILVA, 2009).
The errors can also be called by omission or commission (API 770, 2001).The errors of omission occur when human forgets to do something (lapses), or deliberately decides not to do some task (violation).About the errors of commission, the human errs when performing the task (slips/ lapses or mistakes) or the human deliberately does different from what should be done (violation).

HUMAN ERRORS MECHANISMS
The mechanisms of human errors are defined in three levels of behavior according to Rasmussen (1982Rasmussen ( , 1983)): skill, rule and knowledge-based behavior.Into mechanism of knowledge--based behavior, the human reads the problem, formulates the diagnosis to help him to take action and elaborates a procedure plan to solve the problem; in this case the human uses his knowledge.In the rule-based behavior, the human sees the problem and uses procedures already established.Finally, toward the skill-based behavior, the human looks the situation and seeks routines already storage in his mind and takes the action.In this particular case the actions were already practiced for several times before, and, for this, he does them automatically.The Table 1 shows the relation of behaviors and errors types.Reason (1990) relates the errors types with behavior of the errors which he named as generic error-modelling system (GEMS).Thus, the types of errors can be associated to errors mechanisms in their three levels of behavior.The errors called slips/lapses are deeply related with the skill--based behavior, where the responsible by the maintenance, for instance, gets confused to define the instrument alarm set, and puts a different value from the real.He knows what to do, but due to performance shaping factors, for example stress, lead him to commit the error.When an operational routine is performed (skill-based behavior) and the outcome is not the expected, the human will change to rule-based behavior, according to Figure 1.He will verify what procedures and information he should remember for diagnosing the problem and, thus, solve it.In this context, it is essential to have formal procedures explaining how to perform the task to provide explicit background and not just tacit background (SAURIN et al., 2012).Nevertheless, this last action could not be enough to solve the problem, and the next step will be to use his knowledge and be sufficiently resilient to find out the effective solution to the problem.The resilience is necessary because, probably, not all actions will be enclosed in written procedures or the procedures could be wrong.In this moment it is already the beginning of a possible emergency response.The undesirable event will occur if the human does not have knowledge in the procedures, the procedures are incorrect or he does not deeply know the operational process; the operational process knowledge will be gotten by a set of measures, such as, procedures, training and drills.The lack of these supports will lead him to a wrong interpretation of process variable and will culminate in false diagnosis and, hence, wrong actions.Source: Adapted from Reason (1990) and Rasmussen (1983).
In similar view, Liu et al. (2014) call these phases as categories of tasks.They define them as normal operation, fault detection or monitoring and fault diagnosis.During normal operation the operator makes tasks in automated mode.In the fault detection he verifies at display to check whether there is abnormal condition and compares with predefined rules.Finally, in the fault diagnosis task the operator seeks underlying reasons for the abnormal behavior.

RESEARCH METHODOLOGY
The applied methodology takes place by identification what step of lifecycle the human error has the major contribution through of evaluation of some previous studies and analyzing some renowned standards.
To better understanding the SIS lifecycle, the Figure 2 shows the different phases where it is possible to verify that all of them depend upon a good human performance.Based on these features, this paper focused on three activities that encompass a broad spectrum of human intersection with SIS lifecycle and proposed actions to reduce the possibility or human errors.

SIS Failures
The automation has been used in order to become safer the industrial processes that handle dangerous materials.When the automation is well implemented, besides making the industrial processes safer, it will reduce the human workload, although failures into these systems can cause events of high potential of risk and even a major accident with relevant losses.Recent example was the accident in Macondo well, Gulf of Mexico, in 2010, where the blowout preventer failed and the auxiliary systems failed as well (DHSG, 2011).
A survey conducted by HSE (2003), about the causes of incidents, analyzed 34 incidents in different companies.In spite of being a small sample, therefore with low statistical significance, showed that 44% of the main causes of the incidents were related with instruments specification.This means that the instruments failed to make the safety function, even though they being with their components working in perfect state.Other relevant information was that 20% of the causes were related to changes done after commissioning.These changes disabled the instruments and put them in unable condition to perform their functions as they were specified.The survey also informed that 15% of the causes were regarded to operation and maintenance, others 15% due to design and implementation and, finally, 6% for the sake of installation and commissioning.Thus, these all failures are related to systematic failures.
According to ISA-TR84.00.04 ( 2005) the systematic failures are caused by human errors during the SIF design or on the lifecycle management.Some failures will be hidden, because they will not always be detected on commissioning or during execution of proof tests.As a result, when the SIF is demanded it will fail or will cause spurious trip (case of safe systematic failure).In this context, there are three important types of errors which can cause systematic failures, as showed in Figure 3:

Types of Systematic Errors Versus Types of Human Errors
Specify, design and maintain a safety instrumented system need specialized technicians and high level of cognition process.Hence, the behaviors based on knowledge and rules dominate these errors.The following are the types of errors suggested by ISA-TR84.00.04 (2005).
Specification errors -they mainly occur due to lack of knowledge.These mistakes are committed during the design, mostly in the risk analysis phase.In this phase will emerge the safety protection philosophy, as well as the required safety integrity level (SIL) to reduce the risk.The most usual errors are: inadequate construction materials, inadequate actuator size, inadequate integrity level to reduce the risk, the instruments architecture does not meet the required PFD, etc.These errors occur before installation and can persist throughout safety lifecycle.The other situation that contributes for these errors is during changes process.
Equipment errors -they are related with defective installations, inappropriate bypass, faulty maintenance, etc. Slips/lapses or mistakes are associated with systematic errors and occur in any phase of safety lifecycle.
Programming errors -they are errors caused by slips/lapses or mistakes and are committed into initial programming of software or when there are some changes in software programming.
While the random failures are predictable and minimized by devices redundancy and good quality of the components, the systematic failures are unpredictable and devices redundancy has little effectiveness.Sometimes, the redundancy is affected by common causes failures as a consequence of systematic failures.Then, it is reinforced the necessity to undertake a robust management for these failures.

Specification Errors
The SIS specification starts with a preview study by risk analysis.Usually the risk analysis is performed by Hazard Operability (HAZOP) methodology, which can be complemented by Layer of Protection Analysis (LOPA) in order to define, in detailed way, the needed layers of protection to reduce the risk, according to tolerability risk matrix adopted by the company.From this point, it must set the recommended SIS and its safety integrity level.The automation specialists will define the suitable specification that fits with the process and the instruments architecture to maintain the process in safe way and spurious trip in acceptable level required for the company.The engineering department will develop the design based on the specifications previously defined, then, the design package will be delivered to purchasing department.
The most usual errors coming from risk analysis that will reflect as systematic failures are the following: failure to identify the causes of scenario; inability to identify the consequences of scenario; failure to identify the frequency of initiating event; inadequate definition of existing independent protection layer (IPL) as well as inadequate set of IPL that will be implemented with the respective probabilities of failure on demand.
The cause of the scenario is the initiating event in LOPA, and it will be the starting point to set the protection layers.If the initiating event is identified incorrectly, it is sure that the independent protection layers will be inadequate and the safety instrumented functions will not be effective.Thus, they will fail for what were designed, even their components are in perfect conditions.
The scenario consequence is fundamental to define the risk tolerability.If the identified consequence is underestimated for a small flammable material releasing rather than of a huge flammable releasing with possibility of explosion and multiple fatalities, the level of protection also will be underestimated.If necessary to recommend a SIF as a protection layer, it will be chosen one with low integrity level.So, the scenario will remain in high frequency rate because the SIF will not be able to avoid a major accident due to its high probability to fail on demand.
The definition of failure frequency rate of the initiating event is the first phase of the LOPA.When setting the frequency rate lower than the real value, this decision will reflect into SIF integrity level.For example, frequency rate of 1E-03 failures/year rather than 1E-01 failures/year.For a risk tolerability level of 1E-04 failures/year means to specify a SIF SIL 1, whereas the correct value would be SIF SIL 3. So, the process will lose the safety condition and will enhance the possibility of an accident.
The independent protection layer must be: • Effective in preventing the consequence when it functions as designed; • Independent of the initiating event and the components of any other IPL already claimed for the same scenario; • Auditable; the assumed effectiveness in terms of consequence prevention and PFD must be capable of validation in some manner (by documentation, review, testing, etc.).(CCPS, 2001, p. 80).
Additionally, it is fundamental to choose the appropriate probabilities of failure on demand values for each IPL.Overestimated PFD (best that could be) set for protection layers different from SIF (for instance, relief valves) means that the chosen SIF will have an integrity level much lower than the real necessity.
In addition to errors already mentioned, mistakes can occur when developing and elaborating safety requirement specification (SRS), engineering design as well as purchasing process.
In order to avoid errors during changes, it is essential to have a excellent management of changes and follow all steps of this management system to guarantee that after changes were implemented the safety level was not affected.

What to Do to Avoid Specification Errors
The errors caused by rule and knowledge behavior will be reduced with enforcement of procedures, training and coaching.Thence, it is important to apply training regarded to Hazard Operability Analysis, Layer of Protection Analysis and Safety Instrumented System.The other option is to have an expert monitoring (coaching) of these methodologies in a short period of time until confirm that the professional in training is able to undertake theses activities with good quality.Also, the procedures about these methodologies have to be available to be consulted when necessary.A checklist to support the SRS development based on IEC 61511 or ANSI/ISA-84.00.01 (clause 10) will help to decrease the possibility of errors by omission.The design development also can be supported by a checklist, taking as basis the clause 11 of IEC 61511 or ANSI/ISA-84.00.01.Errors related to purchasing process can be reduced by a suppliers prequalification using criteria adopted for instruments safety certification according to IEC 61508 or in accordance with prior use knowledge.The quality assurance will be carried out in the devices purchase process performed according to design, delivering them in appropriate way as well as inspected in the receiving to verify if the devices were manufactured as requested by purchasing process.

Equipment and Programming Errors
The equipment quality assurance goes through installation process.The whole installation process must be in accordance with the design.Continuous check must be done to guarantee the precepts of engineering.Installation errors can cause common failures, for example, use one connection for two pressure transmitters.In this case, if the connection is clogged it will lose both transmitters.Maintenance and operation errors also are important in the systematic failures.A wrong instrument calibration can prevent its safety function when demanded.By pass installation can disable completely its safety function.Another errors that is necessary to have great attention is the forgetfulness (omission) to open instruments block valve after maintenance completion.
The programming errors could occur in any phase of programming activities.They are very usual because is a complex process, mentally exhausting and sometimes monotonous.These human performance shaping factors contribute to increase the probability of human error.The monotonous job overly reduces the stress that is, paradoxically, not good once decreases the human attention level causing a reduction of the task effectiveness, as show the Figure 4.

What to Do to Avoid Equipment and Programming Errors
These errors can be caused by slips/lapses or mistakes which are related with skill, rules and knowledge-based behavior.The most effective actions to avoid these errors are maintenance and production procedures, checklists and training.A management of change detailed procedure also will contribute to control the changes that occur during the whole lifecycle.The validation by tests will confirm if the system is ready to operate.Comply the clauses 14, 15, 16 e 17 of IEC 61511 or ANSI/ISA-84.00.01 will help to reduce the equipment errors.Exclusively to programming errors the activities alternation (program and test) and rest will contribute to minimize the performance loss.The clause 13 of IEC 61511 or ANSI/ISA-84.00.01 will be a guideline for testing the logic solver and associated software.

Figure 4 -
Figure 4 -Hypothetical relationship of psychological stress and performance effectiveness

Table 1 -
Behavior and types of errors.

Based on skill Actions based in past knowledge that became routines. The actions are performed with low level of cognition Low level of cognition -automatic actions Slips Based on rules Actions based on procedures or others information Medium level of cognition -between automatic and conscious actions Mistakes Based on knowledge Actions based on knowledge High level of cognition -totally conscious actions Mistakes
Source: Based on CCPS (1994).