Applying Neural Networks and Analogous Estimating to Determine the Project Budget

Accepted for publication

PMI Global Congress 2015 – North America
Orlando – Florida – USA – 2015

Abstract

This paper aims to discuss the use of the Artificial Neural Networks (ANN) to model aspects of the project budget where traditional algorithms and formulas are not available or not easy to apply. Neural networks use a process analogous to the human brain, where a training component takes place with existing data and subsequently, a trained neural network becomes an “expert” in the category of information it has been given to analyse. This “expert” can then be used to provide projections given new situations based on an adaptive learning (STERGIOU & CIGANOS, 1996).

The article also presents a fictitious example of the use of neural networks to determine the cost of project management activities based on the complexity, location, budget, duration and number of relevant stakeholders. The example is based on data from 500 projects and is used to predict the project management cost of a given project.

Artificial Neural Networks (ANN)

Some categories of problems and challenges faced in the project environment may depend on many subtle factors that a computer algorithm cannot be created to calculate the results (KRIESEL, 2005). Artificial Neural Networks (ANN) are a family of statistical learning models inspired by the way biological nervous systems, such as the brain, process information. They process records one at a time, and “learn” by comparing their classification of the record with the known actual classification of the record.

The errors from the initial classification of the first record are fed back into the network, and used to modify the networks algorithm the second time around, and so on for a large number of iterations in a learning process in order to predict reliable results from complicated or imprecise data (STERGIOU & CIGANOS, 1996) (Exhibit 01).

21418.pngExhibit 01 – Artificial Neural Networks Architecture (adapted from MCKIM, 1993 and STERGIOUS & CIGANOS, 1996)

Some typical applications of ANN are

  • handwriting recognition,
  • stock market prediction,
  • image compression,
  • risk management,
  • sales forecasting
  • industrial process control.

The mathematical process behind the calculation uses different neural network configurations to give the best fit to predictions. The most common network types are briefly described below.

Probabilistic Neural Networks (PNN) – Statistical algorithm where the operations are organized in multi-layered feedforward network with four layers (input, pattern, summation and output). It is fast to be trained but it has a slow execution and requires large memory. It is also not as general as the feedforward networks (CHEUNG & CANNONS, 2002).

Multi-Layer Feedforward Networks (MLF) – MLF neural networks, trained with a back-propagation learning algorithm (Exhibit 02). They are the most popular neural networks (SVOZIL, KVASNIČKA & POSPÍCHAL, 1997).

21409.pngExhibit 02 – Training data and generalization in a Multi-Layer Feedforward Network (SVOZIL, D , KVASNIČKA, V. & POSPÍCHAL, J. , 1997)

Generalized Regression Neural Networks (GRNN) – Closely related to PNN networks, it is a memory-based network that provides estimates of continuous variables. It is a one-pass learning algorithm with a highly parallel structure. The algorithmic form can be used for any regression problem in which an assumption of linearity is not justified (SPECHT, 2002).

Analogy Process and Data Set

One of the key factors of the Neural Networks is the data set used on the learning process. If the data set is not reliable, the results from the networks calculations will not be reliable. The use of Artificial Neural Networks can be considered one kind of analogy (BAILER-JONES & BAILER-JONES, 2002).

Analogy is a comparison between two or more elements, typically for the purpose of explanation or clarification (Exhibit 03). One of the most relevant uses of the analogy is to forecast future results based on similar results obtained in similar conditions (BARTHA, 2013). The challenge is to understand what a similar condition is. Projects in the past can be a reference for future projects if the underlining conditions where they were developed still exist in the project subject to the analysis.

21398.pngExhibit 03 – Simple analogy example “sock are to feet as gloves are to hands” (Adaptedfrom Spitzig, 2013)

One of the most relevant aspects of the analogy is related to the simple process of estimation based on similar events and facts. This process reduces the granularity of all calculations, where the final project costs can be determined by a set of fixed finite variables.

Data Set, Dependent and Independent Categories and Numeric Variables

The first step to develop an Artificial Neural Network is to prepare the basic data set that will be used as a reference for the “training process” of the neural network. It is important to highlight that usually the right dataset is expensive and time consuming to build (INGRASSIA & MORLINI, 2005). A dataset is composed by a set of variables filled with information that will be used as a reference. These references are called cases (Exhibit 04).

Variables

Independent variables

Dependent variable (output)

v1

v2

v3

vn

v’1

Exhibit 04 – Structure of a basic dataset

Cases

Case 1

Case 2

Case 3

Case n

The most common variables types are

Dependent Category – dependent or output variable whose possible values are taken from a set of possible categories; for example Yes or No, or Red, Green or Blue.

Dependent Numeric – dependent or output variable whose possible values are numeric.

Independent Category – an independent variable whose possible values are taken from a set of possible categories; for example Yes or No, or Red, Green or Blue.

Independent Numeric – an independent variable whose possible values are numeric.

In the project environment, several variables can be used to calculate the project budget. Some common examples are

Complexity – Level of complexity of the project (Low, Medium, High). Usually it is an independent category.

Location – Location where the project works will happen. Associated to the complexity of the works and logistics. Most of the time it is an independent category.

Budget – Planned budget of the project. It is a numeric variable that can be independent or dependent (output).

Actual Cost – Actual Expenditure of the project. It is most of the time an independent numeric variable.

Cost Variance – The difference between the budget and the actual cost. It is a numeric variable that can be independent or dependent (output)

Baseline Duration – Duration of the project. Independent numeric variable.

Actual Duration – Actual duration of the project. Usually an independent numeric variable.

Duration Variance – The difference between the baseline duration and the actual duration.

Type of Contract – Independent category variable that defines the type of the contract used for the works in the project (ie: Fixed Firm Price, Cost Plus, Unit Price, etc).

Number of Relevant Stakeholder Groups – Independent numeric variable that reflect the number of relevant stakeholder groups in the project.

Some examples of input variables are presented at the Exhibit 05, 06 and 07.

Input Variables

Description

Unit

Range

Exhibit 05 – Example of Variables in Road Construction (SODIKOV, 2005)

PWA

Predominant Work Activity

Category

New Construction Asphalt or Concrete

WD

Work Duration

month

14–30

PW

Pavement Width

m

7–14

SW

Shoulder Width

m

0–2

GRF

Ground Rise Fall

nillan

2–7

ACG

Average Site Clear/Grub

m2/kin

12605–30297

EWV

Earthwork Volume

m3/kin

13134–31941

SURFCLASS

Surface Class

Category

Asphalt or Concrete

BASEMATE

Base Material

Category

Crushed Stone or Cement Stab.

OUTPUT VARIABLE

USDPERKM

Unit Cost of New Construction Project

US Dollars (2000)

572.501.64-4.006.103.95

Description

Range

Exhibit 06 – Example of key variables for buildings (ARAFA & ALQEDRA, 2011)

Ground floor

100–3668 m2

Area of the typical floor

0–2597 m2

No. of storeys

1–8

No. of columns

10–156

Type of foundation

1 – isolated

2 – isolated and combined

3 – raft or piles

No. of elevators

0–3

No. of rooms

2–38

Cost of structural skeleton

6,282469,680 USD

Project Characteristics

Unit

Type of information

Descriptors

Exhibit 07 – Example of variables for a building construction ( AIBINU, DASSANAYAKE & THIEN, 2011)

Gross Floor Area (GFA)

m2

Quantitative

n.a

Principal structural material

No unit

Categorical

1 – steel

2 – concrete

Procurement route

No unit

Categorical

1 – traditional

2 – design and construct

Type of work

No unit

Categorical

1 – residential

2 – commercial

3 – office

Location

No unit

Categorical

1 – central business district

2 – metropolitan

3 – regional

Sector

No unit

Categorical

1 – private sector

2 – public sector

Estimating method

No unit

Categorical

1 – superficial method

2 – approximate quantities

Number of storey

No unit

Categorical

1 – one to two storey(s)

2 – three to seven storeys

3 – eight storeys and above

Estimated Sum

Cost/m2

Quantitative

n.a

Training Artificial Neural Networks

When the dataset is ready the network is ready to be trained. Two approaches can be used for the learning process: supervised or adaptive training.

In the supervised training, both inputs and outputs are provided and the network compares the results with the provided output. This allows the monitoring of how well an artificial neural network is converging on the ability to predict the right answer.

For the adaptive training, only the inputs are provided. Using self-organization mechanisms, the neural networks benefits from continuous learning in order to face new situations and environments. This kind of network is usually called self-organizing map (SOM) and was developed by Teuvo Kohonen (KOHONEN, 2014).

One of the biggest challenges of the training method is to decide on which network to use and the runtime process in the computer. Some networks can be trained in seconds but in some complex cases with several variables and cases, hours can be needed just for the training process.

The results of the training process are complex formulas that relate the input or independent variables with the outputs (dependable variables) like the graph presented in the Exhibit 2.

Most of the commercial software packages usually test the results of the training with some data points to evaluate the quality of the training. Around 10 to 20% of the sample is used for testing purposes (Exhibit 08).

training_results_palisade.pngExhibit 08 – Training results example to forecast the bloody pressure where some data is used for testing the network results (Palisade Neural Tools software example)

Prediction Results

After the training, the model is ready to predict future results. The most relevant information that should be a focus of investigation is the contribution of each individual variable to the predicted results (Exhibit 09) and the reliability of the model (Exhibit 10).

relative_variable_impacts.pngExhibit 09 – Example of Relative Variable impacts, demonstrating that the Salary variable is responsible for more than 50% of the impact in the dependent variable (Palisade Neural Tools software example)

histogram-probability_training.png Exhibit 10 – Example of histogram of Probability of Incorrect Categories showing a chance of 30% that 5% of the prediction can be wrong (Palisade Neural Tools software example)

It is important to highlight that one trained network that fails to get a reliable result in 30% of the cases is much more unreliable than another one that fails in only 1% of the cases.

Example of Cost Modeling using Artificial Neural Networks

In order to exemplify the process, a fictitious example was developed to predict the project management costs on historical data provided by 500 cases. The variables used are described in the Exhibit 11.

Name

Description

Variable Type

Exhibit 11 – Variables used on the example dataset

Project ID

ID Count of each project in the dataset

Location

Location where the project was developed (local or remote sites)

Independent Category

Complexity

Qualitative level of project complexity (Low, Medium and High)

Independent Category

Budget

Project Budget (between $500,000 and $2,000,000)

Independent Numeric

Duration

Project Duration (Between 12 and 36 months)

Independent Numeric

Relevant Stakeholder Groups

Number of relevant stakeholder groups for communication and monitoring (between 3 and 5)

Independent Category

PM Cost

Actual cost of the project management activities (planning, budgeting, controlling)

Dependent Numeric (Output)

The profiles of the cases used for the training are presented at the Exhibit 12, 13, 14, 15 and 16 and the full dataset is presented in the Appendix.

21235.pngExhibit 12 – Distribution of cases by Location

21225.pngExhibit 13 – Distribution of cases by Complexity

21214.pngExhibit 14 – Distribution of cases by Project Budget

21205.pngExhibit 15 – Distribution of cases by Project Duration

21194.pngExhibit 16 – Distribution of cases by Relevant Stakeholder Groups

The training and the tests were executed using the software Palisade Neural Tools. The test was executed in 20% of the sample and a GRNN Numeric Predictor. The summary of the training of the ANN is presented at the Exhibit 17.

21184.pngExhibit 17 – Palisade Neural Tools Summary Table

The relative impact of the five independent variables are described at the Exhibit 18, demonstrating that more than 50% of the impact in the Project Management cost is related to the project budget in this fictitious example.

21173.pngExhibit 18 – Relative Variable Impacts

The training and tests were used to predict the Project Management Cost of a fictitious project with the following variables

Name

Variable Type

Exhibit 19 – Basic information of a future project to be used to predict the Project Management costs

Location

Local Project

Complexity

High Complexity

Budget

$810,756

Duration

18 months

Relevant Stakeholder Groups

5 Stakeholder groups

Relevant Stakeholder Groups

Independent Category

PM Cost

Dependent Numeric (Output)

After running the simulation, the Project Management cost predictions based on the patterns in the known data is $24,344.75, approximately 3% of the project budget.

Conclusions

The use of Artificial Neural Networks can be a helpful tool to determine aspects of the project budget like the cost of project management, the estimated bid value of a supplier or the insurance cost of equipment. The Neural Networks allows some precise decision making process without an algorithm or a formula based process.

With the recent development of software tools, the calculation process becomes very simple and straightforward. However, the biggest challenge to produce reliable results lies in the quality of the known information. The whole process is based on actual results, and most of the time the most expensive and laborious part of the process is related to getting enough reliable data to train and test the process.

References

AIBINU, A. A., DASSANAYAKE, D. & THIEN, V. C. (2011). Use of Artificial Intelligence to Predict the Accuracy of Pretender Building Cost Estimate. Amsterdam: Management and Innovation for a Sustainable Built Environment.

ARAFA, M. & ALQEDRA, M. (2011). Early Stage Cost Estimation of Buildings Construction Projects using Artificial Neural Networks. Faisalabad: Journal of Artificial Intelligence.

BAILER-JONES, D & BAILER-JONES, C. (2002). Modeling data: Analogies in neural networks, simulated annealing and genetic algorithms. New York: Model-Based Reasoning: Science, Technology, Values/Kluwer Academic/Plenum Publishers.

BARTHA, P (2013). Analogy and Analogical Reasoning. Palo Alto: Stanford Center for the Study of Language and Information.

CHEUNG, V. & CANNONS, K. (2002). An Introduction to Neural Networks. Winnipeg, University of Manitoba.

INGRASSIA, S & MORLINI, I (2005). Neural Network Modeling for Small Datasets In Technometrics: Vol 47, n 3. Alexandria: American Statistical Association and the American Society for Quality

KOHONEN, T. (2014). MATLAB Implementations and Applications of the Self-Organizing Map. Helsinki: Aalto University, School of Science.

KRIESEL, D. (2005). A Brief Introduction to Neural Networks. Downloaded on 07/01/2015 at https://www.dkriesel.com/_media/science/neuronalenetze-en-zeta2-2col-dkrieselcom.pdf

MCKIM, R. A. (1993). Neural Network Applications for Project Management. Newtown Square: Project Management Journal.

SODIKOV, J. (2005). Cost Estimation of Highway Projects in Developing Countries: Artificial Neural Network Approach. Tokyo: Journal of the Eastern Asia Society for Transportation Studies, Vol. 6.

SPECHT, D. F. (2002). A General Regression Neural Network. New York, IEEE Transactions on Neural Networks, Vol 2, Issue 6.

SPITZIG, S. (2013). Analogy in Literature: Definition & Examples in SAT Prep: Help and Review. Link accessed on 06/30/2015: https://study.com/academy/lesson/analogy-in-literature-definition-examples-quiz.html

STERGIOUS, C & CIGANOS, D. (1996). Neural Networks in Surprise Journal Vol 4, n 11. London, Imperial College London.

SVOZIL, D, KVASNIČKA, V. & POSPÍCHAL, J. (1997). Introduction to multi-layer feed-forward neural networks In Chemometrics and Intelligent Laboratory Systems, Vol 39. Amsterdam, Elsevier Journals.

Budget , Cost , Neural Networks