The Manager's Guide

The Manager’s Guide contains blog posts on Leadership and Systems Engineering. The Leadership posts provide a self-study course in leadership for managers and for workers who wish to prepare themselves for management. The articles address motivating people and improving processes. People and processes are common to every type of organization so the course applies to any organization. The older posts cover Systems Engineering and can be found in the archive or by searching on key words.

Wednesday, September 14, 2011

11 Introduction to Model Based Systems Engineering

11.0 Introduction

The advantages of using labeled graphical models, diagrams, tables of data and similar non prose descriptions compared to natural language or prose descriptions have been discussed several times. Now we make a distinction between two types of models. One type is, as stated, a non-prose description of something. The second type is analysis models; either static models that predict performance or dynamic models referred to as simulations. Static analysis models may be strictly analytical or may be machine readable and executable. Modern simulations are typically machine readable and executable. This is an arbitrary distinction as the DoD defines a model as a physical, mathematical, or otherwise logical representation of a system, entity, phenomenon, or process. (DoD 5000.59 -M 1998)

In Chapter 5 it was stated that PBSE is model based but includes prose documents as well. The models used in PBSE can be either the first type or the second type. Now we want to introduce a different approach to using models for systems engineering. This approach is called Model Based System Engineering (MBSE) and it strives to accomplish system engineering with models that are machine readable, executable or operative. An INCOSE paper^11-1 defines MBSE as an approach to engineering that uses models as an integral part of the technical baseline that includes the requirements, analysis, design, implementation, and verification of a capability, system, and/or product throughout the acquisition life cycle.

This chapter is an introduction to MBSE; no attempt is made to review or even summarize the extensive literature on MBSE. MBSE is rapidly evolving, facilitated both by development of commercial tools and by an INCOSE effort to extend the maturity and capability of MBSE over the decade from 2010 to 2020. Whereas we attempt to describe how MBSE offers benefits compared to traditional prose based systems engineering it isn’t claimed that pure MBSE is superior or inferior to methodologies that mix MBSE, PBSE and traditional methods. The intent is to provide an introduction that enables readers to assess how MBSE can be beneficial to their work and to point the way toward further study.

Traditional systems engineering is a mix of prose based material, typically requirements and plans, and models such as functional diagrams, physical diagrams and mode diagrams. Eventually design documentation ends in drawings, which are models. MBSE can be thought of as replacing the prose documents that define or describe a system, such as requirements documents, with models. We are not concerned as much with plans although plans like test plans are greatly improved by including many diagrams, photos and other models with a minimum of prose.

To some it may seem difficult to replace requirements documents with models. However, QFD can be stand-alone systems engineering process and QFD is a type of MBSE. Although it does not attempt to heavily employ machine readable and executable models, QFD is an example of defining requirements in the form of models. Another way to think about requirements is that mathematically requirements are graphs and can therefore be represented by models. A third way to think about requirements as models is as tree structures. Each requirement may have parent requirements and daughter requirements and just as no leaf of a tree can exist without connection to twigs, twigs to limbs, and limbs to the trunk no requirement can stand alone. Trees can be represented by diagrams so requirements can all be represented in a diagram.

Throughout this book there is an emphasis on representing design information as models in order to reduce ambiguity and the likelihood of misinterpretation of text based design information. There is also an emphasis on using analysis models and simulations as much as possible throughout the life cycle of a system development. The use of models and simulations improves functional analysis, design quality, system testing and system maintenance. Think of MBSE as combining these two principles; then it becomes clear why MBSE is desirable. Another way to look at traditional systems engineering vs. MBSE is for traditional systems engineering engineers write documents and then models are developed from the documents. In MBSE the approach is to model what is to be built from the beginning.

Model based design has been standard practice for many engineering specialties since the 1980s. Structural analysis, thermal analysis, electrical circuit analysis, optical design analysis and aerodynamics are a few examples of the use of Computer Aided Design (CAD) or model based design analysis. It is systems engineering that been slow to transition from non-model based methods, with the exception of performance modeling and simulation. To achieve the benefits of MBSE systems engineers need to embrace requirements diagrams, Use Case analysis and other MBSE tools along with performance modeling and simulation.

11.1 Definitions of Models As Applied to MBSE

Models have been referred to throughout this material without providing a formal definition or defining the types of models typically used in systems engineering. Formally, a model is a representation of something, as described in the DoD definition given above. For our purposes a model is a representation of a design element of a system. Types of models of interest to MBSE include^11-2:

Schematic Models: A chart or diagram showing relationships, structure or time sequencing of objects. For MBSE schematic models should have a machine-readable representation. Examples include FFBDs, interface diagrams and network diagrams.

Performance Model: An executable representation that provides outputs of design elements in response to inputs. If the outputs are dynamic then the model is called a simulation.

Design Model: A machine interpretable version of the detailed design of a design element. Design models are usually represented by CAD drawings, VHDL, C, etc.

Physical model: A physical representation that is used to experimentally provide outputs in response to inputs. A breadboard or brass board circuit is an example.

Achieving machine readable and executable models means that the models must be developed using software. Useful languages used by software and systems engineers for such models are the Unified Modeling Language^TM (UML®) and its derivative SysML™. A brief introduction to these languages is presented here along with references for further study.

Tuesday, August 30, 2011

Chapters Available for Viewing or Download

The first 10 chapters of the book Introduction to Pattern and Model Based Systems Engineering are available for viewing or download at https://sites.google.com/site/themanagersguide/system-engineering . These ten chapters include all of the material posted to date on this blog. The final two chapters are planned to be posted in September, first as blogs on this site and then as complete downloadable chapters on the site above. After the final chapters are posted a book will be published with all the chapters. This book is intended as a guide for training in modern methods that speed up systems engineering, reduce the cost of the systems engineering phase of product development and improve the quality of the systems engineering work.

Wednesday, August 3, 2011

Two More Risk Reduction Tools

10.3.2 Risk Register - The risk summary grid can be used as a tool in the development team’s risk management meetings but a better tool is the risk register. The risk register ranks risks by the expected dollar value of each risk according to the operational definition of risk given earlier. Constructing the risk register on a spreadsheet allows risks to be sorted by dollar value so that the highest risks are always on top of the list. The risk register also facilitates keeping all risks in the same data base even though management actions may be active on only the top five or ten at any time. When a high risk is mitigated the expected dollar value of the risk is reduced and it falls out of the top five or ten but is still on the list. This enables reviewing mitigated risks to ensure they remain mitigated or to readdress a risk at a later time when all the higher risks have been mitigated to even lower values. An example of a simple risk register with three risks constructed on a spread sheet is shown in Figure 10-8.

Figure 10-8 An example of a risk register constructed in columns on a spread sheet.

The risk type and impact if risk occurs are usually described as “if”, “then” statements as in Figure 10-8. This helps the management team remember specifically what each risk entails as they conduct reviews over the life of the activity. Expected values are expressed in dollars, which facilitates both ranking and decisions about how much resources should be assigned to mitigation activities. Assuming of course that in managing activities in the development organization it is the practice to hold some fraction of the budget in reserve to handle unforeseen events. Funds from this reserve budget are assigned to risk mitigation activities. Risk mitigation actions should be budgeted and scheduled as part on on-going work. A failure many inexperienced managers make is handling risks outside of the mainline budget and schedule. This undisciplined approach often leads to risk management degenerating into an action item list and finally to a reactive approach to unexpected events rather that a proactive approach to reduce the risks systematically.

A more complete risk register template than the example shown in Figure 10-8 might contain columns for the risk number, title, description (if), impact (then), types (three columns: cost, schedule, quality or technical), probability of occurrence, cost impact, schedule impact, mitigation plan and mitigation schedule. The form of the risk register template is not critical so the team managing the risks should construct a template that contains the information they feel they need to effectively manage risks.

The risk register, if properly maintained and managed, is a sufficient tool for risk management on small and short duration projects. Setting aside an arbitrary management reserve budget to manage risks is ok for small projects. Portions of the reserve are allocated to mitigation of risks and the budgets and expenses for risk mitigation can be folded into the overall cost management system. Large, long duration projects or high value projects warrant a more focused approach to budgeting for risk management. These management actions do not usually involve systems engineering but systems engineers should be aware of the methods used for management of risk reduction budgets.

In summary, spending a small amount of money in proactively mitigating risks is far better than waiting until the undesirable event occurs and then having to spend a large amount of money fixing the consequences. Remember that risk management is proactive (problem prevention) and not reactive. Also risk management is NOT an action item list for current problems. Finally, risk management is an on-going activity. Do not prepare risk summary grids or risk registers and then put them in a file as though that completes the risk management process, a mistake inexperienced managers make too often.

10.4 Design Iterations Reduce Risk

Design iterations are planned “build, test and learn” activities for high risk parts of the system; parts that are small enough to build, test and assess rapidly. Examples best illustrate the concept of using design iterations to reduce risk:

Engineering builds and tests two types of breadboard circuits to get data needed for a subsystem specification, trade studies and subsequent detailed design
Manufacturing pilots a new production process during architecture definition and uncovers yield problem early
Analysts simulate three candidate signal processing algorithms during concept development and recommend the best
Software implements and tests high risk parts of three alternative approaches for system control software during requirements analysis.

Design iteration is not a fire fighting technique; it is a methodical risk reduction methodology. Design iteration is not “build the entire system, test it and fix it if it doesn’t work” approach; in fact it is intended to avoid falling into such an unproductive approach. Note that “spiral development”, described earlier, is system level risk reduction methodology. Design iterations can be thought of as a methodology that supports implementing progressive freeze.

Recall the message in Figure 6-32 that the cost of making design changes is low in the early stages of a system development when there are many degrees of freedom and becomes higher as the development progresses and there are fewer and fewer degrees of freedom. Thus design iterations are cost effective for many high risk items in the early stages of a development. It’s ok to have many short cycle iterations in parallel in early phases and it’s ok to “throw away” some results as the team learns and lowers risk.

Monday, July 25, 2011

Constructing a Risk Summary Grid

10.3 Tools for Risk Management

Standard tools for risk management include risk matrices; also called risk summary grids, and risk registers. There are also tables of definitions and guidelines that aid in using the matrices and registers. A methodology useful for reducing risk through proactive and planned build and test steps is called design iteration. These tools and design iteration are described in this chapter. Other tools aiding or supporting the identification of risks include fault trees, worst case analysis and failure modes analysis. Risk burn down charts that display how the total expected value of all identified risks is reduced with time as mitigation actions are completed are useful in monitoring the overall progress of risk mitigation and the effectiveness of budgeting for risk management.^10-1

10.3.1 Risk Summary Grid - The risk summary grid is a listing of the top ranked risks on a grid of probability vs. impact. The risk summary gird is excellent for showing all top risks on a single graphic and grouping the risks as low, medium or high. Typical grids are 3 x 3 or 5 x 5. An example 5 x 5 template is shown in Figure 10-2.

Figure 10-2 One example of a 5 x 5 risk summary grid template

The 5 x 5 risk summary grid enables risks to be classified as low, medium or high; typically color coded green, yellow and red respectively, and ranked in order of importance. Relative importance is the product of probability and impact. Note that the definitions for low and medium are not standard. The definition used in Figure 10-2 is conservative in limiting low risk to the five squares in the lower left of the grid with risk values of 0.5 or less. Medium risks have values of 0.7 to 3.5 and high risks have values from 4.5 to 8.1. Others, e.g. the Risk Management Guide for DOD Acquisition^10-2 (An excellent tutorial on risk management), define the entire first column plus six other lower left squares as low risk.

Identified risks are assigned to a square according to the estimates of their probability of occurrence and impact to the overall activity. In Figure 10-2 there is one medium risk, shown by the x in the square with a probability 0.5, impact 7 and therefore having a relative importance of 3.5. The numbers shown for impact are arbitrary and must be defined appropriate to the activity for which risk is being managed.

Some risk management processes described on the web use letters rather than numbers to rank risk probability in constructing risk summary grids. The objective is to assign either a probability numbers or letter to each risk. To do this it is necessary to make a judgment of the likelihood that the risk occurs. The table shown in Figure 10-3 provides reasonable guidelines for such judgments. Thus, if the likelihood of an event occurring is judged to be remote then assign the probability of 0.1 or the letter A. If it is highly likely assign 0.7 or D. It may be argued that guidelines are needed for what is remote or likely. Unfortunately this wouldn’t help as there is always some guess work or judgment required. If several members of a team discuss the likelihood then they can probably reach agreement and this is adequate. It is important for the novice to understand that it isn’t essential that the probabilities are exact. The objective is to come close enough to compare the relative probabilities of several events so that the events can be prioritized in relation to their relative risk or relative probability of occurrence.

Figure 10-3 Guidelines for assigning probability numbers or letters to risk based on judgment criteria.

After assigning a probability to a risk it is necessary to make a judgment of the impact of occurrence of the risk. A risk event can cause an unexpected cost or cost increase, a slip in the schedule for achieving some related event or reduce the quality or technical performance of some design requirement. It is also possible for the risk to impact two or even all three of the cost, schedule or quality measures. The table shown in Figure 10-4 provides one set of guidelines for assigning impact numbers 1, 2, 3, 4 or 5 to a risk event.

Figure 10-4 Guidelines for assigning impact numbers to a risk event.

Costs can be defined as either percentage of budget, as shown in Figure 10-4, or in actual monetary units. Similarly schedule can be defined as percent slip, relative slip or actual time slip.

A risk summary grid template using the guidelines provided in Figures 10-3 and 10-4 is shown in Figure 10-5.

Figure 10-5 A less conservative risk summary grid template using the guidelines provided in Figures 10-3 and 10-4.

The process using a 3 x 3 risk summary grid typically assigns probability of risks as 0.1, 0.3 or 0.9 and impacts as 1, 3 or 9. There are three squares for each of the low, medium and high risk classifications with relative importance values ranging from 0.1 to 8.1 according to the products of probability and impact. An example of a 3 x 3 risk summary grid template is shown in Figure 10-6.

Figure 10-6 An example template for a 3 x 3 risk summary grid.

Specific process details or numerical values are not important. What is important is having a process that allows workers and managers to assess and rank risks and to communicate these risks to each other, and in some cases to customers. The simple risk summary grids are useful tools for accomplishing these objectives and are most useful in the early stages of the life cycle of an activity and for communicating an overall picture of risks.

The identified risks are collected in a list and the ten or so with the highest risk values are numbered or given letter identifications. The associated numbers or letters are then displayed in the appropriate square on the risk summary grid. In use the risk values of each square are either not shown in the square or made small so there is room for several risk identifiers in a square. The risk summary grid then provides a quick visual measure of the number of high, medium and low risks. In the early stages of a project it should be expected that there are more risks in the high and medium categories than the low and as risk mitigation progresses the number of high risks are reduced.

Having identified the risks and ranked them the team must decide what to do with risks that are assigned as Low, Medium or High. One set of guidelines is shown in the table provided in Figure 10-7.

Figure 10-7 Example guidelines for actions for each level of risk.

Again, the specific guidelines a team employs is not as important as it is for the team to have agreed upon guidelines appropriate to their work and organization and to follow them.

10-1 The Manager’s Guide for Effective Leadership by Joe Jenney, AuthorHouse, 2009

10-2 Risk Management Guide for DOD Acquisition, Sixth Edition (Version 1.0), Department of Defense, August 2006 http://www.dau.mil/pubs/gdbks/risk_management.asp

Tuesday, July 19, 2011

Introduction to Risk and Opportunity Management

Risk is always present; its presence is a fact of nature. Accepting that risk is always present is the first step toward managing risks to reduce the effects of risks. Managing risk is the responsibility of the development program leaders but the mechanics are often delegated to systems engineering. Even if systems engineers are not responsible for maintaining the processes and tools it is essential that they understand the importance of risk management and the methods used for effective risk management. Inattention to risk management is the second highest cause of projects not meeting expectations. Just like other systems engineering processes it takes experience and discipline to conduct effective risk management.

Development programs also have opportunities for improving cost, schedule or system performance. It is important to identify and manage opportunities as well as risks in order to have an effective program. This chapter defines risk, outlines a risk management process that can be used for risk and opportunity management and provides examples of templates and processes useful for risk and opportunity management.

10.1 Risk Definition

Risk is the consequence of things happening that negatively impact the performance of a system development project. Risks arise from events that occur inside and outside the development organization. The consequence of an event can impact the quality, cost or schedule of a system development project, or some combination of these effects. There is risk in any project but there are usually more risks associated with projects that are new to the development organization’s experience. Risks are always present in the development of new products or services or changes to the processes, people, materials or equipment used in the development of products or services. Risks to developing new products and services arise from unplanned changes to the internal environment or changes in the external environment, such as the economy, costs of materials, labor market, customer preferences or actions by a competitor, a regulating body or a government agency. An effective development team faces up to risks and manages risks so that the negative impacts are minimized.

There is an operational definition of risk that aids in managing risk. This definition is:

Risk R is The Probability p of an Undesirable Event Occurring; Multiplied by The Consequence of the Event Occurrence measured in arbitrary units C or dollars $; R=p x C or R=p x $.

This definition allows risks to be quantified and ranked in relative importance so that the development team knows which risks to address first, i.e. the risks with the highest values of R. If the event consequence is measured in dollars then it’s easier to evaluate how much budget is reasonable to assign to eliminate or reduce the consequence of the risk.

The second definition measures risk in units of dollars. Thus impacts to the quality of a product or service or to the schedule of delivering the product or service are converted to costs. Impacts to quality are converted to dollar costs via estimated warranty costs, cost of the anticipated loss of customers or loss of revenue due to anticipated levels of discounting prices. Schedule delays are converted to dollar costs by estimating the extra costs of labor during the delays and/or the loss of revenue due to lost sales caused by the schedule delays.

Opportunities can also be defined operationally by the product of the probability an opportunity for improvement can be realized and the consequence if the opportunity is realized, measured either in arbitrary units or dollars. In the rest of this chapter when risk is addressed the reader should remember that it can be viewed as “risk or opportunity”.

The key to good risk management is to address the highest risk first. There are three reasons to address the highest risk first. First is that mitigating a high risk can result in changes to plans, designs, approaches or other major elements in a project. The earlier these changes are implemented the lower the cost of the overall project because money and people resources are not wasted on work that has to be redone later. The second reason is that some projects may fail due to the impossibility of mitigating an inherent risk. The earlier this is determined the fewer resources are spent on the failed project thus preserving resource for other activities. The third reason is that any project is continually competing for resources with other activities. A project that has mitigated its biggest risks has a better chance of competing for continued resource allocation than activities that still have high risks.

10.2 Managing Risk

Managing risk means carrying out a systematic process for identifying, measuring and mitigating risks. Managing risk is accomplished by taking actions before risks occur rather than reacting to occurrences of undesirable events. The DoD SEF defines four parts to risk management and the NASA SE Handbook defines five top level parts and a seven block flow chart for risk management. It is helpful to decompose these into 11 steps. The 11 steps in effective risk management are:

1. Listing the most important requirements that the project must meet to satisfy its customer(s). These are called Cardinal Requirements and are identified in requirements analysis or via Quality Function Deployment.

2. Identifying every risk to a project that might occur that would have significant consequence to meeting each of the Cardinal Requirements

3. Estimating the probability of occurrence of each risk and its consequences in terms of arbitrary units or dollars

4. Ranking the risks by the magnitude of the product of the probability and consequence (i.e. by the definition of risk given above)

5. Identifying proactive actions that can lower the probability of occurrence and/or the cost of occurrence of the top five or ten risks

6. Selecting among the identified actions for those that are cost effective

7. Assigning resources (funds and people) to the selected actions and integrating the mitigation plans into the project budget and schedule

8. Managing the selected action until its associated risk is mitigated

9. Identifying any new risks resulting from mitigation activities

10. Replace mitigated risks with lower ranking or new risks as each is mitigated

11. Conduct regular (weekly or biweekly) risk management reviews to:

· Status risk mitigation actions

· Brainstorm for new risks

· Review that mitigated risks stay mitigated

In identifying risks it is important to involve as many people that are related to the activity as possible. This means people from senior management, the development organization, other participating organizations and supporting organizations. Senior managers see risks that engineers do not and engineers see risks that managers don’t recognize. It is helpful to use a list of potential sources of risk in order to guide people’s thinking to be comprehensive. A list might look like that shown in Figure 10-1.

Figure 10-1 An example template for helping identify possible sources of risk to the customer’s cardinal requirements.

It also helps ensure completeness of understanding risks if each risk is classified as a technical, cost or schedule risk or a combination of these categories.

Friday, July 15, 2011

Summarize Verification Results in a Compliance Matrix

9.2.5 Compliance Matrix – The data resulting from the actions summarized in the verification matrix for verifying that the system meets all requirements are collected in a compliance matrix. The compliance matrix shows performance for each requirement. It flows performance from the lowest levels of the system hierarchy up to top levels. It identifies the source of the performance data and shows if the design is meeting all requirements. The bottom up flow of performance provides early indication of non-compliant system performance and facilitates defining mitigation plans if problems are identified during verification actions. An example compliance matrix for the switch module is shown in Figure 9-3.

Figure 9-3 An example Compliance Matrix for the simple switch function illustrated in Figure 9-1.

Note that the requirements half of the compliance matrix is identical to the requirements half of the verification matrix. The compliance matrix is easily generated by adding new columns to the verification matrix. Results that are non-compliant, such as the switching force, or marginally compliant, such as the on resistance, can be flagged by adding color to one of the value, margin or compliant columns or with notes in the comments column.

In summary, the arrows labeled verification in Figure 6-4 from functional analysis to requirements analysis, from design to functional analysis and from design to requirements analysis relate to the iteration that the systems engineers do to ensure the design is complete and accurate and that all “shall” requirements are verified in system integration and system test. This iteration is necessary so that for each requirement a verification method is identified, any necessary test equipment, test software and data analysis software is defined in time to have validated test equipment, test procedures and test data analysis software ready when needed for system integration and test.

9.3 Systems Engineering Support to Integration, Test and Production

Manufacturing personnel and test personnel may have primary responsibility for integration, test and production however; systems engineers must provide support to these tasks. Problem resolution typically involves both design and systems engineers and perhaps other specialty engineers depending on the problem to be solved. Systems engineers are needed whenever circumstances require changes in parts or processes to ensure system performance isn’t compromised.

Tuesday, July 12, 2011

Planning for System Integration and Testing

9.2.2 System Integration and Test Plan – The purpose of the System Integration and Test Plan (SITP) is to define the step by step process for combining components into assemblies, assemblies into subsystems and subsystems into the system. It is also necessary to define at what level software is integrated and the levels for conducting verification of software and hardware. Because of the intimate relationship of the verification matrix to the system integration and test it is recommended that the SITP be developed before the verification matrix is deemed complete.

The SITP defines the buildup of functionality and the best approach is usually to build from the lowest complexity to higher complexity. Thus, the first steps in integration are the lowest levels of functionality; e.g. backplanes, operating systems and electrical interfaces. Then add increasing functionality such as device drivers, functional interfaces, more complex functions and modes. Finally implement system threads such as major processing paths, error detection paths and end-to-end threads. Integration typically happens in two phases: hardware to hardware and software to hardware. This is because software configured item testing often needs operational hardware to be valid. Two general principles to follow are: test functionality and performance at the lowest level possible and, if it can be avoided, do not integrate any hardware or software whose functionality and performance has not been verified. It isn’t always possible to follow these principles, e.g. sometimes software must be integrated with hardware before either can be meaningfully tested.

One objective of the SITP is to define a plan that avoids as much as possible having to disassemble the system to implement fixes to problems identified in testing. A good approach is to integrate risk mitigation into the SIPT. For example, there is often a vast difference between the impact of an electrical design problem and a mechanical or optical design problem. Some electrical design or fabrication problems discovered in I & T of an engineering model can be corrected with temporary fixes (“green wires”) and I & T can be continued with minimal delay. However, a serious mechanical or optical problem found in the late stages of testing, e.g. in a final system level vibration test, can take months to fix due to the time it takes to redesign and fabricate mechanical or optical parts and conduct the necessary regression testing. Sometimes constructing special test fixtures for early verification of the performance of mechanical, electro-mechanical or optical assemblies is good insurance against discovering design problems in the final stages of I & T.

The integration plan can be described with an integration flow chart or with a table listing the integration steps in order. An integration flow chart graphically illustrates the components that make up each assembly, the assemblies that make up each subsystem etc. Preparing the SITP is an activity that benefits from close cooperation among system engineers, software engineers, test engineers and manufacturing engineers. For example, system engineers typically define the top level integration flow for engineering models using guidelines listed above. Manufacturing engineers typically define the detailed integration flow to be used for manufacturing prototypes and production models. If the system engineers use the same type of documentation for defining the flow for the engineering model that manufacturing engineers use then it is likely that the same documentation can be edited and expanded by manufacturing engineers for their purposes.

It should be expected that problems will be identified during system I & T. Therefore processes for reporting and resolving failures should be part of an organizations standard processes and procedures. System I & T schedules should have contingency for resolving problems. Risk mitigation plans should be part of the SITP and be in place for I & T; such as having adequate supplies of spare parts or even spare subsystems for long lead time and high risk items.

System integration is complete when a defined subset of system level functional tests has been informally run and passed, all failure reports are closed out and all system and design baseline databases have been updated. The final products from system integration include the Test Reports, Failure Reports and the following updated documentation:

Rebaselined System Definition

Requirements documents and ICDs
Test Architecture Definition
Test Plans
Test Procedures

Rebaselined Design Documentation

Hardware Design Drawings
Fabrication Procedures
Formal Release of Software including Build Procedures and a Version Description Document
System Description Document
TPM Metrics

It is good practice to gate integration closeout with a Test Readiness Review (TRR) to review the hardware/software integration results, ensure the system is ready to enter formal engineering or development model verification testing and that all test procedures are complete and in compliance with test plans. On large systems it is beneficial to hold a TRR for each subsystem or line replaceable unit (LRU) before holding the system level TRR.

9.2.3 Test Architecture Definition and Test Plans and Procedures – The SITP defines the tests that are to be conducted to verify performance at appropriate levels of the system hierarchy. Having defined the tests and test flow it is necessary to define the test equipment and the plans and procedures to be used to conduct the tests. Different organizations may have different names for the documentation defining test equipment and plans. Here the document defining the test fixtures, test equipment and test software is called the Test Architecture Definition. The test architecture definition should include the test requirements traceability database and test system and subsystems specifications.

Test Plans define the approach to be taken in each test; i.e. what tests are to be run, the order of the tests, the hardware and software equipment to be used and the data that is to be collected and analyzed. Test Plans should define the entry criteria to start tests, suspension criteria to be used during tests and accept/reject criteria for test results.

Test Procedures are the detailed step by step documentation to be followed in carrying out the tests and documenting the test results defined in the Test Plans. Other terminologies include a System Test Methodology Plan that describes how the system is to be tested and a System Test Plan that describes what is to be tested. Document terminology is not important; what is important is defining and documenting the verification process rigorously.

Designing, developing and validating the test equipment and test procedures for a complex system is nearly as complex as designing and developing the system and warrants a thorough systems engineering effort. Neglecting to put sufficient emphasis or resources on these tasks can result in delays of readiness of the test equipment or procedures and risks serious problems in testing due to inadequate test equipment or processes. Sound systems engineering practices treat test equipment and test procedure development as deserving the same disciplined effort and modern methods as used for the system under development.

The complexity of system test equipment and system testing drives the need for disciplined system engineering methods and is the reason for developing test related documentation in the layers of SITP, Test Architecture Definition, Test Plans and finally Test Procedures. The lower complexity top level layers are reviewed and validated before developing the more complex lower levels. This approach abstracts detail in the top levels making it feasible to conduct reviews and validate accuracy of work without getting lost in the details of the final documentation.

The principle of avoiding having to redo anything that has been done before also applies to developing the Test Architecture Definition, Test Plans and Test Procedures. This means designing the system to be able to be tested using existing test facilities and equipment where this does not compromise meeting system specifications. When existing equipment is inadequate then strive to find commercial off the shelf (COTS) hardware and software for the test equipment. If it is necessary to design new special purpose test equipment then consider whether future system tests are likely to require similar new special purpose designs. If so it may be wise to use pattern base systems engineering for the test equipment as well as the system.

Where possible use test methodologies and test procedures that have been validated through prior use. If changes are necessary developing Test Plans and Procedures by editing documentation from previous system test programs is likely to be faster, less costly and less prone to errors than writing new plans. Sometimes test standards are available from government agencies.

9.2.4 Test Data Analysis – Data collected during systems tests often requires considerable analysis in order to determine if performance is compliant with requirements. The quantity and types of data analysis needed should be identified in the test plans and the actions needed to accomplish this analysis are to be included in the test procedures. Often special software is needed to analyze test data. This software must be developed in parallel with other system software since it must be integrated with test equipment and validated by the time the system completes integration. Also some special test and data analysis software may be needed in subsystem tests during integration. Careful planning and scheduling is necessary to avoid project delays due to data analysis procedures and software not being complete and validated by the time it is needed for system tests.

Search This Blog