Figure 10-2 One example of a 5 x 5 risk summary grid template
Monday, July 25, 2011
10.3 Tools for Risk Management
Standard tools for risk management include risk matrices; also called risk summary grids, and risk registers. There are also tables of definitions and guidelines that aid in using the matrices and registers. A methodology useful for reducing risk through proactive and planned build and test steps is called design iteration. These tools and design iteration are described in this chapter. Other tools aiding or supporting the identification of risks include fault trees, worst case analysis and failure modes analysis. Risk burn down charts that display how the total expected value of all identified risks is reduced with time as mitigation actions are completed are useful in monitoring the overall progress of risk mitigation and the effectiveness of budgeting for risk management.10-1
10.3.1 Risk Summary Grid - The risk summary grid is a listing of the top ranked risks on a grid of probability vs. impact. The risk summary gird is excellent for showing all top risks on a single graphic and grouping the risks as low, medium or high. Typical grids are 3 x 3 or 5 x 5. An example 5 x 5 template is shown in Figure 10-2.
The 5 x 5 risk summary grid enables risks to be classified as low, medium or high; typically color coded green, yellow and red respectively, and ranked in order of importance. Relative importance is the product of probability and impact. Note that the definitions for low and medium are not standard. The definition used in Figure 10-2 is conservative in limiting low risk to the five squares in the lower left of the grid with risk values of 0.5 or less. Medium risks have values of 0.7 to 3.5 and high risks have values from 4.5 to 8.1. Others, e.g. the Risk Management Guide for DOD Acquisition10-2 (An excellent tutorial on risk management), define the entire first column plus six other lower left squares as low risk.
Identified risks are assigned to a square according to the estimates of their probability of occurrence and impact to the overall activity. In Figure 10-2 there is one medium risk, shown by the x in the square with a probability 0.5, impact 7 and therefore having a relative importance of 3.5. The numbers shown for impact are arbitrary and must be defined appropriate to the activity for which risk is being managed.
Some risk management processes described on the web use letters rather than numbers to rank risk probability in constructing risk summary grids. The objective is to assign either a probability numbers or letter to each risk. To do this it is necessary to make a judgment of the likelihood that the risk occurs. The table shown in Figure 10-3 provides reasonable guidelines for such judgments. Thus, if the likelihood of an event occurring is judged to be remote then assign the probability of 0.1 or the letter A. If it is highly likely assign 0.7 or D. It may be argued that guidelines are needed for what is remote or likely. Unfortunately this wouldn’t help as there is always some guess work or judgment required. If several members of a team discuss the likelihood then they can probably reach agreement and this is adequate. It is important for the novice to understand that it isn’t essential that the probabilities are exact. The objective is to come close enough to compare the relative probabilities of several events so that the events can be prioritized in relation to their relative risk or relative probability of occurrence.
Figure 10-3 Guidelines for assigning probability numbers or letters to risk based on judgment criteria.
After assigning a probability to a risk it is necessary to make a judgment of the impact of occurrence of the risk. A risk event can cause an unexpected cost or cost increase, a slip in the schedule for achieving some related event or reduce the quality or technical performance of some design requirement. It is also possible for the risk to impact two or even all three of the cost, schedule or quality measures. The table shown in Figure 10-4 provides one set of guidelines for assigning impact numbers 1, 2, 3, 4 or 5 to a risk event.
Figure 10-4 Guidelines for assigning impact numbers to a risk event.
Costs can be defined as either percentage of budget, as shown in Figure 10-4, or in actual monetary units. Similarly schedule can be defined as percent slip, relative slip or actual time slip.
A risk summary grid template using the guidelines provided in Figures 10-3 and 10-4 is shown in Figure 10-5.
Figure 10-5 A less conservative risk summary grid template using the guidelines provided in Figures 10-3 and 10-4.
The process using a 3 x 3 risk summary grid typically assigns probability of risks as 0.1, 0.3 or 0.9 and impacts as 1, 3 or 9. There are three squares for each of the low, medium and high risk classifications with relative importance values ranging from 0.1 to 8.1 according to the products of probability and impact. An example of a 3 x 3 risk summary grid template is shown in Figure 10-6.
Figure 10-6 An example template for a 3 x 3 risk summary grid.
Specific process details or numerical values are not important. What is important is having a process that allows workers and managers to assess and rank risks and to communicate these risks to each other, and in some cases to customers. The simple risk summary grids are useful tools for accomplishing these objectives and are most useful in the early stages of the life cycle of an activity and for communicating an overall picture of risks.
The identified risks are collected in a list and the ten or so with the highest risk values are numbered or given letter identifications. The associated numbers or letters are then displayed in the appropriate square on the risk summary grid. In use the risk values of each square are either not shown in the square or made small so there is room for several risk identifiers in a square. The risk summary grid then provides a quick visual measure of the number of high, medium and low risks. In the early stages of a project it should be expected that there are more risks in the high and medium categories than the low and as risk mitigation progresses the number of high risks are reduced.
Having identified the risks and ranked them the team must decide what to do with risks that are assigned as Low, Medium or High. One set of guidelines is shown in the table provided in Figure 10-7.
Figure 10-7 Example guidelines for actions for each level of risk.
Again, the specific guidelines a team employs is not as important as it is for the team to have agreed upon guidelines appropriate to their work and organization and to follow them.
10-1 The Manager’s Guide for Effective Leadership by Joe Jenney, AuthorHouse, 2009
10-2 Risk Management Guide for DOD Acquisition, Sixth Edition (Version 1.0), Department of Defense, August 2006 http://www.dau.mil/pubs/gdbks/risk_management.asp
Tuesday, July 19, 2011
Risk is always present; its presence is a fact of nature. Accepting that risk is always present is the first step toward managing risks to reduce the effects of risks. Managing risk is the responsibility of the development program leaders but the mechanics are often delegated to systems engineering. Even if systems engineers are not responsible for maintaining the processes and tools it is essential that they understand the importance of risk management and the methods used for effective risk management. Inattention to risk management is the second highest cause of projects not meeting expectations. Just like other systems engineering processes it takes experience and discipline to conduct effective risk management.
Development programs also have opportunities for improving cost, schedule or system performance. It is important to identify and manage opportunities as well as risks in order to have an effective program. This chapter defines risk, outlines a risk management process that can be used for risk and opportunity management and provides examples of templates and processes useful for risk and opportunity management.
10.1 Risk Definition
Risk is the consequence of things happening that negatively impact the performance of a system development project. Risks arise from events that occur inside and outside the development organization. The consequence of an event can impact the quality, cost or schedule of a system development project, or some combination of these effects. There is risk in any project but there are usually more risks associated with projects that are new to the development organization’s experience. Risks are always present in the development of new products or services or changes to the processes, people, materials or equipment used in the development of products or services. Risks to developing new products and services arise from unplanned changes to the internal environment or changes in the external environment, such as the economy, costs of materials, labor market, customer preferences or actions by a competitor, a regulating body or a government agency. An effective development team faces up to risks and manages risks so that the negative impacts are minimized.
There is an operational definition of risk that aids in managing risk. This definition is:
Risk R is The Probability p of an Undesirable Event Occurring; Multiplied by The Consequence of the Event Occurrence measured in arbitrary units C or dollars $; R=p x C or R=p x $.
This definition allows risks to be quantified and ranked in relative importance so that the development team knows which risks to address first, i.e. the risks with the highest values of R. If the event consequence is measured in dollars then it’s easier to evaluate how much budget is reasonable to assign to eliminate or reduce the consequence of the risk.
The second definition measures risk in units of dollars. Thus impacts to the quality of a product or service or to the schedule of delivering the product or service are converted to costs. Impacts to quality are converted to dollar costs via estimated warranty costs, cost of the anticipated loss of customers or loss of revenue due to anticipated levels of discounting prices. Schedule delays are converted to dollar costs by estimating the extra costs of labor during the delays and/or the loss of revenue due to lost sales caused by the schedule delays.
Opportunities can also be defined operationally by the product of the probability an opportunity for improvement can be realized and the consequence if the opportunity is realized, measured either in arbitrary units or dollars. In the rest of this chapter when risk is addressed the reader should remember that it can be viewed as “risk or opportunity”.
The key to good risk management is to address the highest risk first. There are three reasons to address the highest risk first. First is that mitigating a high risk can result in changes to plans, designs, approaches or other major elements in a project. The earlier these changes are implemented the lower the cost of the overall project because money and people resources are not wasted on work that has to be redone later. The second reason is that some projects may fail due to the impossibility of mitigating an inherent risk. The earlier this is determined the fewer resources are spent on the failed project thus preserving resource for other activities. The third reason is that any project is continually competing for resources with other activities. A project that has mitigated its biggest risks has a better chance of competing for continued resource allocation than activities that still have high risks.
10.2 Managing Risk
Managing risk means carrying out a systematic process for identifying, measuring and mitigating risks. Managing risk is accomplished by taking actions before risks occur rather than reacting to occurrences of undesirable events. The DoD SEF defines four parts to risk management and the NASA SE Handbook defines five top level parts and a seven block flow chart for risk management. It is helpful to decompose these into 11 steps. The 11 steps in effective risk management are:
1. Listing the most important requirements that the project must meet to satisfy its customer(s). These are called Cardinal Requirements and are identified in requirements analysis or via Quality Function Deployment.
2. Identifying every risk to a project that might occur that would have significant consequence to meeting each of the Cardinal Requirements
3. Estimating the probability of occurrence of each risk and its consequences in terms of arbitrary units or dollars
4. Ranking the risks by the magnitude of the product of the probability and consequence (i.e. by the definition of risk given above)
5. Identifying proactive actions that can lower the probability of occurrence and/or the cost of occurrence of the top five or ten risks
6. Selecting among the identified actions for those that are cost effective
7. Assigning resources (funds and people) to the selected actions and integrating the mitigation plans into the project budget and schedule
8. Managing the selected action until its associated risk is mitigated
9. Identifying any new risks resulting from mitigation activities
10. Replace mitigated risks with lower ranking or new risks as each is mitigated
11. Conduct regular (weekly or biweekly) risk management reviews to:
· Status risk mitigation actions
· Brainstorm for new risks
· Review that mitigated risks stay mitigated
In identifying risks it is important to involve as many people that are related to the activity as possible. This means people from senior management, the development organization, other participating organizations and supporting organizations. Senior managers see risks that engineers do not and engineers see risks that managers don’t recognize. It is helpful to use a list of potential sources of risk in order to guide people’s thinking to be comprehensive. A list might look like that shown in Figure 10-1.
Figure 10-1 An example template for helping identify possible sources of risk to the customer’s cardinal requirements.
It also helps ensure completeness of understanding risks if each risk is classified as a technical, cost or schedule risk or a combination of these categories.
Friday, July 15, 2011
9.2.5 Compliance Matrix – The data resulting from the actions summarized in the verification matrix for verifying that the system meets all requirements are collected in a compliance matrix. The compliance matrix shows performance for each requirement. It flows performance from the lowest levels of the system hierarchy up to top levels. It identifies the source of the performance data and shows if the design is meeting all requirements. The bottom up flow of performance provides early indication of non-compliant system performance and facilitates defining mitigation plans if problems are identified during verification actions. An example compliance matrix for the switch module is shown in Figure 9-3.
Figure 9-3 An example Compliance Matrix for the simple switch function illustrated in Figure 9-1.
Note that the requirements half of the compliance matrix is identical to the requirements half of the verification matrix. The compliance matrix is easily generated by adding new columns to the verification matrix. Results that are non-compliant, such as the switching force, or marginally compliant, such as the on resistance, can be flagged by adding color to one of the value, margin or compliant columns or with notes in the comments column.
In summary, the arrows labeled verification in Figure 6-4 from functional analysis to requirements analysis, from design to functional analysis and from design to requirements analysis relate to the iteration that the systems engineers do to ensure the design is complete and accurate and that all “shall” requirements are verified in system integration and system test. This iteration is necessary so that for each requirement a verification method is identified, any necessary test equipment, test software and data analysis software is defined in time to have validated test equipment, test procedures and test data analysis software ready when needed for system integration and test.
9.3 Systems Engineering Support to Integration, Test and Production
Manufacturing personnel and test personnel may have primary responsibility for integration, test and production however; systems engineers must provide support to these tasks. Problem resolution typically involves both design and systems engineers and perhaps other specialty engineers depending on the problem to be solved. Systems engineers are needed whenever circumstances require changes in parts or processes to ensure system performance isn’t compromised.
Tuesday, July 12, 2011
9.2.2 System Integration and Test Plan – The purpose of the System Integration and Test Plan (SITP) is to define the step by step process for combining components into assemblies, assemblies into subsystems and subsystems into the system. It is also necessary to define at what level software is integrated and the levels for conducting verification of software and hardware. Because of the intimate relationship of the verification matrix to the system integration and test it is recommended that the SITP be developed before the verification matrix is deemed complete.
The SITP defines the buildup of functionality and the best approach is usually to build from the lowest complexity to higher complexity. Thus, the first steps in integration are the lowest levels of functionality; e.g. backplanes, operating systems and electrical interfaces. Then add increasing functionality such as device drivers, functional interfaces, more complex functions and modes. Finally implement system threads such as major processing paths, error detection paths and end-to-end threads. Integration typically happens in two phases: hardware to hardware and software to hardware. This is because software configured item testing often needs operational hardware to be valid. Two general principles to follow are: test functionality and performance at the lowest level possible and, if it can be avoided, do not integrate any hardware or software whose functionality and performance has not been verified. It isn’t always possible to follow these principles, e.g. sometimes software must be integrated with hardware before either can be meaningfully tested.
One objective of the SITP is to define a plan that avoids as much as possible having to disassemble the system to implement fixes to problems identified in testing. A good approach is to integrate risk mitigation into the SIPT. For example, there is often a vast difference between the impact of an electrical design problem and a mechanical or optical design problem. Some electrical design or fabrication problems discovered in I & T of an engineering model can be corrected with temporary fixes (“green wires”) and I & T can be continued with minimal delay. However, a serious mechanical or optical problem found in the late stages of testing, e.g. in a final system level vibration test, can take months to fix due to the time it takes to redesign and fabricate mechanical or optical parts and conduct the necessary regression testing. Sometimes constructing special test fixtures for early verification of the performance of mechanical, electro-mechanical or optical assemblies is good insurance against discovering design problems in the final stages of I & T.
The integration plan can be described with an integration flow chart or with a table listing the integration steps in order. An integration flow chart graphically illustrates the components that make up each assembly, the assemblies that make up each subsystem etc. Preparing the SITP is an activity that benefits from close cooperation among system engineers, software engineers, test engineers and manufacturing engineers. For example, system engineers typically define the top level integration flow for engineering models using guidelines listed above. Manufacturing engineers typically define the detailed integration flow to be used for manufacturing prototypes and production models. If the system engineers use the same type of documentation for defining the flow for the engineering model that manufacturing engineers use then it is likely that the same documentation can be edited and expanded by manufacturing engineers for their purposes.
It should be expected that problems will be identified during system I & T. Therefore processes for reporting and resolving failures should be part of an organizations standard processes and procedures. System I & T schedules should have contingency for resolving problems. Risk mitigation plans should be part of the SITP and be in place for I & T; such as having adequate supplies of spare parts or even spare subsystems for long lead time and high risk items.
System integration is complete when a defined subset of system level functional tests has been informally run and passed, all failure reports are closed out and all system and design baseline databases have been updated. The final products from system integration include the Test Reports, Failure Reports and the following updated documentation:
- Rebaselined System Definition
- Requirements documents and ICDs
- Test Architecture Definition
- Test Plans
- Test Procedures
- Rebaselined Design Documentation
- Hardware Design Drawings
- Fabrication Procedures
- Formal Release of Software including Build Procedures and a Version Description Document
- System Description Document
- TPM Metrics
It is good practice to gate integration closeout with a Test Readiness Review (TRR) to review the hardware/software integration results, ensure the system is ready to enter formal engineering or development model verification testing and that all test procedures are complete and in compliance with test plans. On large systems it is beneficial to hold a TRR for each subsystem or line replaceable unit (LRU) before holding the system level TRR.
9.2.3 Test Architecture Definition and Test Plans and Procedures – The SITP defines the tests that are to be conducted to verify performance at appropriate levels of the system hierarchy. Having defined the tests and test flow it is necessary to define the test equipment and the plans and procedures to be used to conduct the tests. Different organizations may have different names for the documentation defining test equipment and plans. Here the document defining the test fixtures, test equipment and test software is called the Test Architecture Definition. The test architecture definition should include the test requirements traceability database and test system and subsystems specifications.
Test Plans define the approach to be taken in each test; i.e. what tests are to be run, the order of the tests, the hardware and software equipment to be used and the data that is to be collected and analyzed. Test Plans should define the entry criteria to start tests, suspension criteria to be used during tests and accept/reject criteria for test results.
Test Procedures are the detailed step by step documentation to be followed in carrying out the tests and documenting the test results defined in the Test Plans. Other terminologies include a System Test Methodology Plan that describes how the system is to be tested and a System Test Plan that describes what is to be tested. Document terminology is not important; what is important is defining and documenting the verification process rigorously.
Designing, developing and validating the test equipment and test procedures for a complex system is nearly as complex as designing and developing the system and warrants a thorough systems engineering effort. Neglecting to put sufficient emphasis or resources on these tasks can result in delays of readiness of the test equipment or procedures and risks serious problems in testing due to inadequate test equipment or processes. Sound systems engineering practices treat test equipment and test procedure development as deserving the same disciplined effort and modern methods as used for the system under development.
The complexity of system test equipment and system testing drives the need for disciplined system engineering methods and is the reason for developing test related documentation in the layers of SITP, Test Architecture Definition, Test Plans and finally Test Procedures. The lower complexity top level layers are reviewed and validated before developing the more complex lower levels. This approach abstracts detail in the top levels making it feasible to conduct reviews and validate accuracy of work without getting lost in the details of the final documentation.
The principle of avoiding having to redo anything that has been done before also applies to developing the Test Architecture Definition, Test Plans and Test Procedures. This means designing the system to be able to be tested using existing test facilities and equipment where this does not compromise meeting system specifications. When existing equipment is inadequate then strive to find commercial off the shelf (COTS) hardware and software for the test equipment. If it is necessary to design new special purpose test equipment then consider whether future system tests are likely to require similar new special purpose designs. If so it may be wise to use pattern base systems engineering for the test equipment as well as the system.
Where possible use test methodologies and test procedures that have been validated through prior use. If changes are necessary developing Test Plans and Procedures by editing documentation from previous system test programs is likely to be faster, less costly and less prone to errors than writing new plans. Sometimes test standards are available from government agencies.
9.2.4 Test Data Analysis – Data collected during systems tests often requires considerable analysis in order to determine if performance is compliant with requirements. The quantity and types of data analysis needed should be identified in the test plans and the actions needed to accomplish this analysis are to be included in the test procedures. Often special software is needed to analyze test data. This software must be developed in parallel with other system software since it must be integrated with test equipment and validated by the time the system completes integration. Also some special test and data analysis software may be needed in subsystem tests during integration. Careful planning and scheduling is necessary to avoid project delays due to data analysis procedures and software not being complete and validated by the time it is needed for system tests.
Friday, July 8, 2011
9.2 Verifying that System Requirements are Met
The first phase of verifying system requirements is a formal engineering process that starts with requirements analysis and ends when the system is accepted by its customer. During the system integration and testing steps must be taken to verify that the system satisfies every “shall” statement in the requirements. These shall statement requirements are collected in a document called the Verification Matrix. The results of the integration and testing of these requirements are documented in a Compliance Matrix. The integration and testing is defined and planned in a System Integration and Test Plan. Related documentation includes the Test Architecture Definition, hardware and software Test Plans & Procedures and Test Data Analysis Plans.
The roles of systems engineers in verification include:
- Developing the optimum test strategy and methodology and incorporating it into the design as it is developed
- Developing the top level System Integration and Test Plan
- Developing the hierarchy of Integration and Test Plans from component level to system level
- Documenting all key system and subsystem level tests
- Defining system and subsystem level test equipment needed and developing test architecture designs
- Developing the Test Data Analysis Plans
- Analyzing test data
- Ensuring that all shall requirements are verified and documented in the Compliance Matrix.
Good systems engineering practice requires that requirements verification takes place in parallel with requirements definition. The decision to define a requirement with a “shall” or “may” or “should” statement involves deciding if the requirement must be verified and if so how the requirement will be verified. This means that the requirements verification matrix should be developed in parallel with the system requirements documentation and reviewed when the system requirements are reviewed, e.g. at peer reviews and at a formal System Requirements Review (SRR).
9.2.1 Verification Matrix – The verification matrix is documentation that defines for each requirement the verification method, the level and type of unit for which the verification is to be performed and any special conditions for the verification. Modern requirements management tools facilitate developing the verification matrix. If such tools are not used then the verification matrix can be developed using standard spreadsheet tools.
There are standard verification methods used by systems engineers. These methods are:
- Analysis -Verifies conformance to required performance by the use of analysis based on verified analytical tools, modeling or simulations that predict the performance of the design with calculated data or data from lower level component or subsystem testing. Used when physical hardware and/or software is not available or not cost effective.
- Inspection - Visually verifies form, fit and configuration of the hardware and of software. Often involves measurement tools for measuring dimensions, mass and physical characteristics.
- Demonstration - Verifies the required operability of hardware and software without the aid of test devices. If test devices should be required they are selected so as to not contribute to the results of the demonstration.
- Test - Verifies conformance to required performance, physical characteristics and design construction features by techniques using test equipment or test devices. Intended to be a detailed quantification of performance.
- Similarity - Verifies requirement satisfaction based on certified usage of similar components under identical or harsher operating conditions.
- Design – Used when compliance is obvious from the design, e.g. “The system shall have two modes, standby and operation”.
- Simulation – Compliance applies to a finished data product after calibration or processing with system algorithms. May be only way to demonstrate compliance.
The DoD SEF defines only the first four of the methods listed above. Many experienced systems engineers find these four too restrictive and also use the other three methods listed. To illustrate a verification matrix with an example consider the function Switch Power. This function might be decomposed as shown in Figure 9-1.
Figure 9-1 A function Switch Power might be decomposed into four sub functions.
An example verification matrix for the functions shown in Figure 9-1 is shown in Figure 9-2. In this example the switch power function is assumed to be implemented in a switch module and that both an engineering model and a manufacturing prototype are constructed and tested. In this example no verification of the switch module itself is specified for production models. Verification of the module performance for production modules is assumed to be included in other system level tests.
It’s not important whether the verification matrix is generated automatically from requirements management software or by copy and paste from a requirements spreadsheet. What is important is to not to have to reenter requirements from the requirements document to the verification matrix as this opens the door for simple typing mistakes.
Figure 9-2 An example verification matrix for a switch module.
Wednesday, July 6, 2011
9 Processes and Tools for Verifying Technical Performance
There are two approaches to verifying technical performance. One is using good engineering practices in all the systems engineering and design work to ensure that the defined requirements and the design meets customer expectations. The other is a formal verification process applied in two phases to hardware and software resulting from the design to verify that requirements are met. Both begin during requirements analysis and continue until a system is operational. The work is represented by the three arrows labeled verification in Figure 6-4 that constitute the backward part of the requirements loop, the design loop and the loop from design synthesis back to requirements analysis and includes both verifying completeness and accuracy of the design and verifying the technical performance of the system.
The first phase of formal system verification typically ends with delivery of a system to the customer but may include integration and testing with a higher level system of the customer or another supplier. The second phase of system verification is accomplished by testing the system in its intended environment and used by its intended users. This phase is typically called operational test and evaluation and is the responsibility of the customer for military systems but may involve the supplier for commercial systems and some NASA systems.
9.1 Verifying Design Completeness and Accuracy
Verifying the completeness and accuracy of the design is achieved by a collection of methods and practices rather than a single formal process. The methods and practices used by systems engineers include:
- System engineers checking their own work
- Checking small increments of work via peer reviews
- Conducting formal design reviews
- Using diagrams, graphs, tables and other models in place of text where feasible and augmenting necessary text with graphics to reduce ambiguity
- Using patterns vetted by senior systems engineers to help ensure completeness and accuracy of design documentation
- Developing and comparing the same design data in multiple formats, e.g. diagrams and matrices or diagrams and tables
- Verifying functional architecture by developing a full set of function and mode mapping matrices including:
- Customer defined functions to functions derived by development team (Some team derived functions are explicit in customer documentation and some are implicit.)
- Functions to functions for defining internal and external interfaces among functions
- Sub modes to functions for each of the system modes
- Mode and sub mode transition matrices defining allowable transitions between modes and between sub modes
- Using tools such as requirements management tools that facilitate verifying completeness and traceability of each requirement
- Using QFD and Kano diagrams to ensure completeness of requirements and identify relationships among requirements
- Using robust design techniques such as Taguchi Design of Experiments
- Iterating between requirements analysis and functional analysis and between design synthesis and functional analysis
- Employing models and simulations to both define requirements and verify that design approaches satisfy performance requirements
- Validating all models and simulations used in systems design before use (Note the DoD SEF describes verification, validation and accreditation of models and simulations. Here only verification and validation are discussed.)
- Employing sound guidelines in evaluating the maturity of technologies selected for system designs
- Maintaining a through risk management process throughout the development program
- Conducting failure modes and effects analysis (FMEA) and worse case analysis (WCA).
Engineers checking their own work are the first line of defense against human errors that can affect system design performance. One of the most important things that experienced engineers can teach young engineers is the importance of checking all work and the collection of methods for verifying their work that they have learned over their years in the engineering profession. Engineers are almost always working under time pressure and it takes disciple to take the time to check work at each step so that simple human mistakes don’t result in having to redo large portions of work. This is the same principle that is behind using peer reviews to catch mistakes early so that little rework is required rather than rely on catching mistakes at major design reviews where correcting mistakes often requires significant rework, with significant impact on schedule and budget.
A reason for presenting duplicate methods and tools for the same task in Chapter 6 was not just that different people prefer different methods but also to provide a means of checking the completeness and accuracy of work. The time it takes to develop and document a systems engineering product is a usually a small fraction of the program schedule so that taking time to generate a second version in a different format does not significantly impact schedule and is good insurance against incomplete or inaccurate work.
Pattern based systems engineering, QFD and Taguchi Design of Experiments (DOE) help ensure the completeness, accuracy and robustness of designs. Extensive experience has demonstrated the cost effectiveness of using these methods even though QFD and Taguchi DOE require users to have specialized training to be effective.
Studies have shown that selecting immature technologies result in large increases in costs (See http://www.dau.mil/conferences/presentations/2006_PEO_SYSCOM/tue/A2-Tues-Stuckey.pdf) Immature technologies often do not demonstrate expected performance in early use and can lead to shortfalls in the technical performance of designs. As a result both NASA and DoD include guidelines for selecting technologies in their acquisition regulations. Definitions of technology readiness levels used by NASA are widely used and are listed at http://esto.nasa.gov/files/TRL_definitions.pdf . Definitions are also provided in Supplement 2-A of the DoD SEF.
Failure analysis methods like FEMA and WCA are more often used by design engineers than systems engineers but systems engineers can include failure modes and worse case considerations when defining the criteria used in system design trade studies.
In summary, verifying technical performance during system development includes the disciplined use of good engineering practices as well as the formal performance verification process. This should not be surprising as these practices have evolved through experience specifically to ensure that system designs meet expected performance as well as other requirements.