A Comparison of Case-Based Reasoning Approaches to Web Hypermedia Project Cost Estimation
Emilia Mendes
Computer Science
Department The University of
Auckland
Private Bag 92019
Auckland, New Zealand
phone: 0064 9 3737599 ext. 6137
Nile Mosley
MxM Technology
P.O. Box 3139, Shortland Street
Auckland, New Zealand
phone: 0064 9 3723211
Ian Watson
Computer Science
Department The University of
Auckland
Private Bag 92019
Auckland, New Zealand
phone: 0064 9 3737599 ext. 6137
Copyright is held by the author/owner(s).
WWW2002, May 7-11, 2002, Honolulu, Hawaii, USA.
ACM 1-58113-449-5/02/0005.
ABSTRACT
Over the years software engineering researchers have suggested numerous techniques for estimating development effort. These techniques have been classified mainly as algorithmic, machine learning and expert judgement. Several studies have compared the prediction accuracy of those techniques, with emphasis placed on linear regression, stepwise regression, and Case-based Reasoning (CBR). To date no converging results have been obtained and we believe they may be influenced by the use of the same CBR configuration.
The objective of this paper is twofold. First, to describe the application of case-based reasoning for estimating the effort for developing Web hypermedia applications. Second, comparing the prediction accuracy of different CBR configurations, using two Web hypermedia datasets.
Results show that for both
datasets the best estimations were obtained with weighted Euclidean distance,
using either one analogy (dataset 1) or 3 analogies (dataset 2).
We suggest therefore that
case-based reasoning is a candidate technique for effort estimation and, with
the aid of an automated environment, can be applied to Web hypermedia
development effort prediction.
Keywords
Web effort prediction, Web hypermedia, case-based reasoning, Web hypermedia metrics, prediction models.
Word count: 6,509
Software
practitioners recognise the importance of realistic estimates of effort to the
successful management of software projects, the Web being no exception. Having
realistic estimates at an early stage in a project's life cycle allow project
managers and development organisations to manage resources effectively.
Several techniques for cost and effort estimation have been proposed over the last 30 years, falling into three general categories [49]:
1)
Expert
judgement (EJ)
- a technique widely used, aims to derive estimates based on an expert's
previous experience on similar projects. The means of deriving an estimate are
not explicit and therefore not repeatable. However, although always difficult
to quantify, expert judgement can be an effective estimating tool on its own or
as an adjusting factor for algorithmic models [23].
2)
Algorithmic
models (AM)
- to date the most popular in the literature, attempt to represent the
relationship between effort and one or more of a project's characteristics. The
main “cost driver” in such a model is usually taken to be some notion of
software size (e.g. the number of lines of source code, number of pages, number
of links). Algorithmic models need calibration to be adjusted to local
circumstances. Examples are the COCOMO model [5], the SLIM model [40] and
function points [1].
3)
Machine
learning (ML)
- Machine learning techniques have in recent years been used as a complement or
alternative to the previous two techniques. Examples include fuzzy logic models
[31], regression trees [46], neural networks [51], and case-based reasoning
[53]. A useful summary of these techniques is presented in [22].
Recently
several comparisons have been made between the three categories of prediction
techniques aforementioned, based on their prediction accuracy
[2,8,9,19,21,22,24-27,29,33,35,36,39,43,47,48]. However no convergence has been
obtained to date. It is believed that the reasons for this are twofold [27].
First, the characteristics of the dataset used have an impact on the
effectiveness of any prediction technique. For example, CBR seems to be
favoured when datasets do not present strong linear relationships between
effort and the product size used, contrary to algorithmic models. Second, when
using CBR, there are several decisions that must be made (e.g., number of
analogies, similarity measures) which may as well influence the accuracy of
estimations.
To
date most literature in effort prediction generates and compares prediction
models using attributes (e.g. lines of code, function points) of conventional
software. This paper looks at effort prediction based on attributes of Web
hypermedia applications instead.
The
World Wide Web (Web) has become the best known example of a hypermedia system.
To date, numerous organisations across the Globe have developed thousands of
commercial and/or educational Web applications. The Web has been used as the
delivery platform for two types of applications: Web hypermedia applications
and Web software applications [10]. A Web hypermedia application is a
non-conventional application characterised by the structuring of information
using nodes (chunks of information), links (relations between nodes), anchors,
access structures (for navigation) and the delivery of this structure over the
Web. Whereas a Web software application represents any conventional software
application that depends on the Web, or uses the Web's infrastructure, for
execution. Typical applications include legacy information systems such as
databases, booking systems, knowledge bases etc. Many e-commerce applications
fall into the latter category.
Our
research focus is on proposing and comparing [17,33-36] development effort
prediction models for Web hypermedia applications. These applications have a
potential in areas such as software engineering [18], literature [52],
education [37], and training [41], to mention but a few. Readers interested in
effort estimation models for Web software applications are referred to [42,38].
This paper has two objectives. The first is to describe the application of case-based reasoning to estimating the effort for developing Web hypermedia applications. The second is to compare the prediction accuracy of several CBR configurations, based on two datasets of Web hypermedia projects.
Those objectives are reflected in the following research questions:
1) Will different CBR configurations generate significantly different prediction accuracy?
2) Which of the CBR configurations employed in this study gives the most accurate predictions for the datasets employed?
These issues are investigated using data gathered through two case studies where sets of suggested metrics for effort prediction were measured. These metrics reflect current industrial practices for developing multimedia and Web hypermedia applications [14,15].
Both
case studies gathered data on Web hypermedia projects developed by Computer
Science Honours or postgraduate students,
attending a Hypermedia and Multimedia Systems course at the University of
Auckland. The first case study took place during the first semester of 2000 and
the second, a replication of the first, took place during the first semester of
2001. The metrics collected were classified as size, effort and confounding
factors [16].
Both
case studies looked at effort prediction for design and authoring processes,
adopting the classification proposed by Lowe and Hall [32]. Using this
classification, authoring encompasses the management of activities for the actual
content and structure of the application and its presentation. Design covers
the methods used for generating the structure and functionality of the
application, and typically does not include aspects such as application
requirements elicitation, feasibility consideration and applications
maintenance.
In
this paper we chose to use CBR as a prediction technique for the following
reasons:
·
An early study comparing CBR to other prediction
techniques [33], CBR presented the best prediction accuracy when using a
dataset of Web hypermedia projects.
·
A second study, using a different dataset, also
showed good prediction accuracy using CBR [34].
·
CBR is an intuitive method and there is evidence
that experts apply analogic reasoning when making estimates [24].
·
CBR is simple and flexible, compared to
algorithmic models.
·
CBR can be used on qualitative and quantitative
data, reflecting closer types of datasets found in real life.
The
rationale for CBR is to characterise the project, for which the estimate is to
be made, relative to a number of attributes (e.g. application complexity, link
complexity etc). This description is then used to find other similar already
finished projects, and an estimate for the new project is made based on the
known effort values for those finished projects.
This
paper is organised as follows: Section 2 presents related work in development
effort prediction for Web hypermedia applications. Section 3 presents in more
depth the technique of case-based reasoning and in Section 4 we describe the
two case studies where the datasets were collected. Section 5 describes the
empirical results of applying CBR. Finally, Section 6 concludes with an
analysis of the strengths and weaknesses of using CBR and suggestions on future
work.
To our knowledge, there are relatively few examples in the literature of studies that investigate effort prediction models generated using data from Web hypermedia applications [17,33-36]. Most research in Web/hypermedia engineering has concentrated on the proposal of methods, methodologies and tools as a basis for process improvement and higher product quality [3,11,20,45].
Mendes
et al. [33] describe a case study involving the development of Web sites
structured according to the Cognitive Flexibility Theory (CFT) [50] principles
in which simple size metrics were collected. Several prediction models are
generated (linear regression, stepwise regression and case-based reasoning) and
compared. Results show that the best predictions were obtained using case-based
reasoning, confirming similar results using non-hypermedia applications
[47,49]. Their metrics can be applied to any hypermedia application, and
represent an initial step towards the proposal of development effort prediction
models for Web hypermedia applications. However, some of these metrics are
subjective, which may have influenced the validity of their results.
Mendes et al. [35] describes
a case study evaluation in which 37 Web hypermedia applications were used.
These were also structured according to the CFT principles and the Web
hypermedia metrics collected were organised into five categories: length size,
complexity size, reusability, effort and confounding factors. The size and
reusability metrics were used to generate top down and bottom up prediction
models using linear and stepwise regression techniques. These techniques were
then compared based on their predictive power and stepwise regression was not
shown to be consistently better than multiple linear regression. A limitation
of this study is that it only compared prediction models generated using
algorithmic techniques. The same dataset was also used as input to a case based
reasoning tool to predict effort [17], where results obtained were most
favourable.
Mendes
et al. [36] presents a case study where size attributes of Web hypermedia
applications were measured. Those attributes correspond to three size
categories, namely Length, Complexity and Functionality. For each size category
they generated prediction models using linear and stepwise regression. The
accuracy of the predictions for those six models was compared using boxplots of
the residuals, as suggested in [30]. Results suggested that all the models
offered similar prediction accuracy. The limitation of this study is also that
it only compared prediction models generated using algorithmic techniques.
Fewster
and Mendes [17] propose a prediction model for authoring and designing Web
applications and for project risk analysis using a General Linear Model. They
suggest two types of metrics: those applied to static Web hypermedia
applications and those applied to dynamic Web hypermedia applications. The
former metrics can also be used to measure other types of hypermedia and Web
applications. Their results were also an initial step towards the proposal of
effort prediction models for Web hypermedia development.
Case-based Reasoning involves [47]:
§ Characterising a project p for which an estimate is required, i.e., identifying project attributes (application size, link complexity etc) which can influence effort.
§ Use of this characterisation as a basis for finding similar (analogous) completed projects, for which effort is known.
§ Use of these effort values, possibly with adjustment, to generate a predicted value of effort for p.
When using CBR there are a number of parameters to decide upon [47,49]:
· Similarity Measure
· Scaling
· Number of analogies
· Analogy Adaptation
Each
parameter in turn can be split into more detail, and maybe incorporated for a
given CBR tool, allowing several CBR configurations.
Similarity Measure measures the level of similarity
between projects, i.e., cases. To our knowledge, the similarity measure most
frequently used in Software engineering and Web engineering literature is the unweighted
Euclidean distance. In the context of this investigation we have used three
measures of similarity, namely the unweighted Euclidean distance, the weighted
Euclidean distance and the Maximum measure, each of which are detailed below:
All formulas presented in this sub-section assume
that x and y represent size attributes for Web hypermedia applications, e.g., x might be page-count and y page-complexity. The pair (xi, yi) represents instances of x and y for project i.
Unweighted Euclidean
distance:
The
Euclidean distance d between the
points (x0,y0) and (x1,y1) is given
by (1) and illustrated in figure 1 by representing co-ordinates on a
two-dimensional space:
(1)
The number of
attributes employed will determine the number of dimensions used.
Weighted Euclidean distance:
It is common in CBR for the attributes, i.e., features vectors to be weighted to reflect the relative importance of each feature. The weighted Euclidean distance d between the points (x0,y0) and (x1,y1) is given by the formula:
(2)
where wx and wy are the weights of x and y respectively.

Figure 1 - Weighted Euclidean distance
using two size attributes
Maximum measure:
Using the maximum measure, the maximum feature similarity defines the case similarity. For two points (x0,y0) and (x1,y1), the maximum measure d is equivalent to the formula:
(3)
This
effectively reduces the similarity measure down to a single feature, although
the maximum feature may differ for each retrieval episode.
Standardisation represents the transformation of attribute values according to a defined rule such that all attributes are measured using the same unit. One possible solution is to assign one to the maximum observed value m [28] and then divide all values by m. This was the strategy chosen for the analysis carried out using CBR-Works (see Section 5).
The
number of analogies refers to the number of similar projects (cases) that will
be used to generate the estimation. In general, only the most similar cases are
selected. For Angelis and Stamelos [2] when small sets of data are used it is
reasonable to consider only a small number of analogies. In this study we have
used 1, 2 and 3 analogies, similarly to [8,9,25,26,2,43,33,34].
Once
the most similar case(s) has/have been selected the next step is to decide how
to generate the estimation. Choices of analogy adaptation techniques presented
in the Software engineering literature vary from the nearest neighbour [8,26],
the mean of the closest analogies [48], the median [2], inverse distance
weighted mean and inverse rank weighted mean [28], to illustrate just a few. In
the Web engineering literature, the adaptation used to date is the mean of the
closest analogies [34,36]. We opted for the mean, median and the inverse rank
weighted mean.
Mean: Represents the average of k analogies, when k>1.
Inverse rank weighted mean: Allows higher ranked analogies to have more influence than lower ones.
For example, using 3 analogies, the closest analogy (CA) would have a weight =
3, the second closest (SC) a weight = 2 and the last one (LA) a weight =1. The
estimation would then be calculated as:
(3*CA + 2*SC + LA)/6 (4)
Median: Represents the median of k analogies, when k>2.
All the estimations presented in this paper were generated using CBR-Works [44], a commercial CBR environment [44] the result of years of collaborative European research by the INRECA I & II projects [4]. It is available commercially from Empolis a knowledge management company (www.tecinno.com). The tool provides a variety of retrieval algorithms (Euclidean, weighted Euclidean, Maximum Similarity etc) as well as fine control over individual feature similarity metrics. In addition, it provides sophisticated support for symbolic features and taxonomies hierarchies as well as providing adaptation rules and formulae.
The analysis presented in this paper was based on two datasets containing information about Web hypermedia applications developed by Computer Science Honours or postgraduate students, attending a Hypermedia and Multimedia Systems course, at the University of Auckland.
The
first dataset (DS1, 34 applications) was obtained using a case study (CS1)
consisting of the design and authoring, by each student, of Web hypermedia
applications aimed at teaching a chosen topic, structured according to the
Cognitive Flexibility Theory (CFT) principles [50], using a minimum of 50
pages. Each Web hypermedia application provided 46 pieces of data [35], from
which we identified 8 attributes, shown in Table 1, to characterise a Web
hypermedia application and its development process.
The second dataset (DS2, 25 applications) was obtained using another case study (CS2) consisting of the design and authoring, by pairs of students, of Web hypermedia applications structured using an adaptation of the Unified Modelling Language [7], with a minimum of 25 pages. Each Web hypermedia application provided 42 pieces of data, from which we identified 6 attributes, shown in Table 2, to characterise a Web hypermedia application and its development process.
Tables
1 and 2 show the attributes that form a basis for our data analysis. Total
effort is our dependent/response variable; remaining attributes our
independent/predictor variables. All attributes were measured on a ratio scale.
The criteria used to select the attributes was [14]: i) practical relevance for Web hypermedia developers; ii) metrics which are easy to learn and cheap to collect; iii) counting rules which were simple and consistent.
Table 1 - Size and Complexity Metrics for DS1
|
Metric |
Description |
|
Page Count (PaC) |
Total number of html or shtml files |
|
Media Count (MeC) |
Total number of original media files |
|
Program Count (PRC) |
Total number of JavaScript files and Java applets |
|
Reused Media Count (RMC) |
Total number of reused/modified media files. |
|
Reused Program Count (RPC) |
Total number of reused/modified programs. |
|
Connectivity Density (COD) |
Average number of internal links[1] per page. |
|
Total Page Complexity (TPC) |
Average number of different types of media per page. |
|
Total Effort (TE) |
Effort in person hours to design and author the application |
On both case studies two questionnaires were used to collect data. The first[2] asked subjects to rate their Web hypermedia authoring experience using five scales, from no experience (zero) to very good experience (four). The second questionnaire[3] was used to measure characteristics of the Web hypermedia applications developed (suggested metrics) and the effort involved in designing and authoring those applications. On both questionnaires, we describe in depth each scale type, to avoid misunderstanding. Members of the research group checked both questionnaires for ambiguous questions, unusual tasks, and number of questions and definitions in the Appendix.
Table 2 - Size and Complexity Metrics for DS2
|
Metric |
Description |
|
Page Count (PaC) |
Total number of html files. |
|
Media Count (MeC) |
Total number of original media files. |
|
Program Length (PRL) |
Total number of statements used in either Javascript or Cascading Style Sheets. |
|
Connectivity Density (COD) |
Average number of links, internal or external, per page. |
|
Total Page Complexity (TPC) |
Average number of different types of media per page. |
|
Total Effort (TE) |
Effort in person hours to design and author the application |
To
reduce learning effects in both case studies, subjects were given a coursework
prior to designing and authoring the Web hypermedia applications, consisting of
creating a simple Web hypermedia application and loading the application onto a
Web server. In addition, in order to measure possible factors that could
influence the validity of the results, we also asked subjects about the main
structure (backbone) of their applications[4],
their authoring experience before and after developing the applications, and
the type of tool used to author/design the Web pages[5].
Finally,
CS1 subjects received training on the Cognitive Flexibility Theory authoring
principles (approximately 150 minutes) and CS2 subjects received training on
the UML's adapted version (approximately 120 minutes). The adapted version
consisted of Use Case Diagrams, Class Diagrams and Transition Diagrams.
Here
we present our comments on the validity of both case studies:
·
The metrics collected, except for effort,
experience, structure and tool, were all objective, quantifiable, and re-measured
by one of the authors using the applications developed, which had been saved on
a CD-ROM. The scales used to measure experience,
structure and tool were described in detail in both questionnaires.
· Subjects' authoring and design experiences were mostly scaled little or average, with a low difference between skill levels. Consequently the original datasets were left intact.
· To reduce maturation effects, i.e. learning effects caused by subjects learning as an evaluation proceeds, subjects had to develop a small Web hypermedia application prior to developing the application measured. They also received training in the CFT principles, for the first case study, and in the UML's adapted version, for the second case study.
· The majority of applications used a hierarchical structure.
·
Notepad and FirstPage were the two tools most
frequently used on CS1 and FirstPage was the tool most frequently used on CS2.
Notepad is a simple text editor while FirstPage is freeware offering
button-embedded HTML tags. Although they differ with respect to the
functionality offered, data analysis using DS1 revealed that the corresponding
effort was similar, suggesting that confounding effects from the tools were
controlled.
·
As the subjects who participated in the case studies were
either Computer Science Honours or postgraduate students, it is likely that
they present skill sets similar to Web hypermedia professionals at the start of
their careers. The use of students as subjects, while sometimes considered
unrealistic, is justified for two reasons: firstly, empirical evidence [6,13]
indicates that students are equal to professionals in many quantifiable
measures, including their approach to developing software; secondly, for
pragmatic considerations, having students as subjects was the only viable
option for both case studies.
The
most common approaches to assessing the accuracy of prediction models are:
·
The Mean Magnitude of Relative Error (MMRE) [49]
·
The Median Magnitude of Relative Error (MdMRE) [39]
·
The Prediction at level n (Pred(n)) [48]
The MMRE is defined
as:
(5)
Where i represents each observation for which effort is predicted.
The
mean takes into account the numerical value of every observation in the data
distribution, and is sensitive to individual predictions with large Magnitude
of Relative Error (MRE), where MRE is calculated as:
MREi =
(6)
where
i represents each observation for
which effort is predicted.
An
option to the mean is the median, which also represents a measure of central
tendency, however it is less sensitive to extreme values. The median of MRE
values for the number i of
observations is called the MdMRE.
Another indicator which is commonly used is the Prediction
at level l, also known as Pred(l). It measures the percentage of
estimates that are within l% of the actual values. Suggestions have been made
[12] that l should be set at 25% and
that a good prediction system should offer this accuracy level 75% of the time.
We have used all
three approaches to assess the accuracy of the different CBR configurations
presented in this paper.
As stated earlier,
during the process of applying case-based reasoning users need to configure the
CBR tool according to four different parameters (similarity measure, scaling,
number of analogies and analogy adaptation).
We
wanted to answer the following questions:
1) Will different CBR configurations generate significantly different prediction accuracy?
2)
Which of the CBR configurations employed in this study
gives the most accurate predictions for the datasets employed?
Therefore, we compared the
prediction accuracy of several estimations generated using different categories
for a given parameter (see Table 3).
We
calculated effort according to the jackknife method (also known as leave one
out cross-validation), as follows:
Step 1 CBR-works was configured to the type of distance to be used: unweighted Euclidean, weighted Euclidean or maximum.
Step 2 One datapoint was chosen and marked as "unconfirmed", in order to be considered as a new project for the purpose of this evaluation.
Step 3 The remaining 33/24 datapoints were kept in the dataset and used in the estimation process.
Step 4 For the "unconfirmed", its actual effort data was discarded, and only the known size metrics kept. This was done in an attempt to simulate a new project for use with CBR-Works to calculate estimated effort.
Step 5 Effort was estimated for 1 to 3 analogies and recorded on a spreadsheet to be used to calculate mean, median, inverse rank weighed mean, mmre, mdmre and pred(25).
Step 6 The datapoint that had been changed to "unconfirmed" was changed back to "confirmed".
Step 7 Steps 2 through 6 were repeated until all 34/25 projects had had their effort estimated. Then another distance would be chosen and the entire jackknife method would be repeated.
Table 3 - CBR approaches that were compared
|
Distance |
Analogies |
Adaptation |
Standardised variables? |
|
Unweighed Euclidean Distance |
1 |
Closest analogy |
Yes |
|
2 |
Mean |
Yes |
|
|
IRWM |
Yes |
||
|
3 |
Mean |
Yes |
|
|
IRWM |
Yes |
||
|
Median |
Yes |
||
|
Weighted Euclidean Distance |
1 |
Closest analogy |
Yes |
|
2 |
Mean |
Yes |
|
|
IRWM |
Yes |
||
|
3 |
Mean |
Yes |
|