Attribute Agreement Analysis For Results

However, a bug tracking system is not a continuous nutrient. The assigned values are correct or not. there is no (or there is no) grey area. If codes, locations, and severity levels are set efficiently, there is only one correct attribute for each of these categories for a specific error. The audit should help to identify the specific people and codes that are the main sources of problems and the evaluation of the attribute agreement should help to determine the relative contribution of reproducibility and reproducibility problems for those specific codes (and individuals). In addition, many bug databases have problems with precision records that indicate where an error was created because the place where the error is detected is recorded and not where the error was created. When the error is detected, there is not much to identify the causes, therefore the accuracy of the site assignment should also be an element of the audit. As with any measurement system, the accuracy and precision of the database must be understood before the information is used (or at least used during use) to make decisions. At first glance, it would seem that the apparent starting point is an attribute analysis (or the measurement of R&R attributes). But it may not be such a good idea. At this stage, the evaluation of the attribute agreement should be applied and the detailed results of the review should provide a good set of information in order to understand how best to organize the evaluation.

In this example, a repeatability assessment is used to illustrate the idea and it also applies to reproducibility. The point here is that many samples are needed to detect differences in an attribute analysis, and if the number of samples is doubled from 50 to 100, the test does not become much more sensitive. Of course, the difference that needs to be identified depends on the situation and the level of risk that the analyst is willing to assume in his decision, but the reality is that, in 50 scenarios, it will be difficult for an analyst to think that there is a statistical difference in the reproducibility of two evaluators with match rates of 96% and 86%. With 100 scenarios, the analyst will hardly see a difference between 96 and 88%. For example, if repeatability is the main problem, evaluators are confused or undecided on certain criteria. If reproducibility is the problem, then evaluators have strong opinions on certain conditions, but those opinions differ. If the problems are shown by several evaluators, the problems are systemic or procedural. If the problems concern only a few evaluators, the problems may simply require a little personal attention. In both cases, training or employment aids could be tailored either to specific individuals or to all evaluators, depending on the number of evaluators guilty of imprecise attribution of attributes. . . .