Chapter 5
Design Decisions and Project Development
Introduction
This chapter discusses the decisions that were made during the course of the project about how the system would operate. These decisions were sometimes constrained by time, complexity, or other external factors, and sometimes they were made to try and ensure that the system was easy to use or met its requirements. Each decision listed includes the alternative solutions that were considered, with the reasoning behind the final decision.
The chapter will also show how the system could have been developed differently if the constraints had not existed. The chapter is divided into three sections:
5.2 Constraints Upon Design Decisions
5.3 Development Method Employed
5.1. Design Decisions
During development a number of decisions had to be made about how to implement certain aspects of the design.
The main decisions that were made were:
- What programming language to use for the project
- How to create the user interface
- How to retrieve the web pages
- What to do with graphics
- How to handle tables of data
- How to present the formatted information to the user
- How to store the database of pages
- What to do with shockwave flash presentations and other active content
- How to most efficiently get the pages from their respective locations
- What links (if any) to follow on the pages, and how deep to follow links
- How to handle tables etc. on sub pages
- How to store users subject preferences
- How much customisation to allow for individual page downloads
- Whether or not to store profiles for multiple users
The alternative solutions to the above problems, with their pros and cons, which solution was chosen and why are listed below:
- What programming language to use for the project
- Visual Basic – This is the easiest option, it would reduce the amount of time needed to create the user interface, but it is not as flexible as some other languages.
- Perl – A good method of developing a server-based solution, but the project was not intended to be server based.
- C/C++ – This is a very powerful language, and is well suited to an application such as this.
C/C++ was chosen because it
would be more challenging than VB, and it is what would be used by a
professional organisation if they were do develop the
system.
- How to create the user interface
- Using a text-based menu system – This is the simplest solution, but the resulting application may not be usable with a screen reader. It also does not look very professional to a sighted user.
- Using Visual Basic linked to C/C++ code complied into DLLs – This simplifies construction of interface, but does not allow quite as much control over individual aspects of interface.
- Using MFC – A good way of developing the interface quickly, but the author has no experience in it, and there would be insufficient time to learn it.
- Using a Windows application programmed in C – a good compromise, experience gained in lectures can be used to aid in development.
A windows application
programmed in C was chosen, because this allows a high level of control over the
operation of the interface, and allows a very professional looking application
to be produced.
- How to retrieve the web pages
- Windows Socket calls – Easy to use and powerful enough for the project’s needs.
- COM OLE Object – This is a good method, but the author has no experience in using COM objects, and there would be insufficient time to learn about them.
COM OLE
objects would have been a good method, but insufficient documentation was found
to make this feasible – the timescale was too short to spend a lot of time
researching it. Windows Sockets were chosen instead.
- What to do with graphics
- Ignore them – This is the easiest option.
- Try to interpret them – Unfeasible.
- Use the ‘alt’ tag to provide a textual description – This is the best solution as long as alt tags are used, on many pages they are not.
The option of
using the alt tag where used was chosen as the best option because it attempts
to convey some of the meaning of the image to the user.
- How to handle tables of data
- Ignore them – This is the easiest option, but impractical, too much information would be lost.
- Automate the extraction of data using some rules – The rules could be complicated and are unlikely to work well for all cases.
- Allow the person who maintains the page database to specify how the data should be extracted – Probably the best, but time consuming, and very susceptible in changes to page layout.
- Automate extraction of data, but based on some settings customised to the page by the person who maintains the page database, with extra rules to cater for changes to the page layout – This is a good compromise.
The
“partial-automation” option was chosen as a compromise between ease
of setting up and accuracy of information. For some tables, a slightly altered
version of the guidelines for “Table rendering by non-visual user
agents” in section 11.4 of the W3C HTML 4.0 specification [HTML]
could be used.
- How to present the formatted information to the user
- Using text controls in a windows application – This is overly complicated, and may not be easily usable with a screen reader. It would mean that no other software would be needed for it to work.
- Using HTML pages – This allows for easier hierarchical structures to be built up and browsed interactively by the user, but it does require a web browser to be installed.
- Using a text file(s) – The simplest solution, but limited capacity for hierarchical documents – the user cannot easily jump to the next section.
- Using a word document(s) – Could be difficult to set up the link between the application and word, suffers from the same problems as text files.
Presenting the
information using HTML pages was chosen because they allow easy creation of
hierarchical documents and the use of talking web browsers.
- How to store the database of pages
- Some form of SQL database, possibly accessed through an ODBC link – This is time consuming, meaning slow download times, and it would also be difficult to implement.
- Comma delimited text file – This is simple, small, fast to download, easy to program, and easy to maintain. It has the disadvantage that it may be more difficult to understand when setting up and maintaining pages than a database.
The
comma delimited text file was chosen to store sites and categories, because it
can hold the same information as a database without the added complications
during implementation.
- What to do with shockwave flash presentations and other active content
- Ignore it – This is the only realistic option
- How to most efficiently get the pages from their respective locations
- Download pages sequentially – This is easier to program, but has the possibility of not using the full bandwidth of the modem, meaning slower download times.
- Download pages concurrently – This would be quicker for larger bandwidth connections, but is more complex to implement.
Downloading the pages
concurrently was chosen as the best solution, because it minimises the amount of
time spent online, and therefore the cost to the user.
- What links (if any) to follow on the pages, and how deep to follow links
- Do not follow any links – This is the simplest solution, but has potential for missing out important information.
- Arbitrary for all pages – This is inflexible, pages are likely to need treating individually.
- Dependant on page, set at time of setting up page database – This is a simple solution, but it limits the users control over download times for specific pages.
- Set by user for each page – This is the most complex solution – It would allow complete control; the user could choose to download extra information on a subject they are especially interested in, but it would add an extra level of complexity to the system.
Setting up link following
policies for individual pages at the time of setting up the page database was
chosen. This provides enough flexibility for most people. Link following is
limited to one depth, because it greatly simplifies the code and reduces
download time without a great loss of content.
- How to handle tables etc. on sub pages
- Use the same settings as for the main page – This method does not work in practice because sub pages are rarely laid out the same as the main page.
- Have a separate setting to apply to all the sub pages – A better solution, but it is unlikely that all the sub-pages will have the same layout.
- Have different settings for different groups of sub-pages using some sort of filter to decide which settings to use – A complex solution, but the only way to ensure a solution that works in most cases.
The more complex solution was
chosen, using settings applied to sub-pages grouped using filters of some sort.
Choosing a simpler system would limit the pages that could be used with the
system too much to be practical.
- How to store users subject preferences
- Using registry keys – This is the preferred method for use in Windows 9x applications, but it makes it more difficult to copy preferences to another computer if you wished to do so.
- Using a standard windows INI file, storing ID numbers of pages chosen for download – This is a good all-round solution, which adheres to standards set by other windows applications.
- Using a text file stored in the application directory – This is the simplest solution, but it does not adhere to windows standards
- Using some form of SQL database – This is unnecessarily complex.
- Maintaining a remote profile on the UMIST server – This would allow the user to move to different computers. It would be complex, and would create possible security problems transmitting data to the server.
The INI file method was
chosen, with an INI file stored in the application directory to store the
preferences. It is a simple solution that meets the needs of the
system.
- How much customisation to allow for individual page downloads
- None – This is the simplest solution.
- Allow user to specify links to follow – This would allow greater control over what sections of sites were downloaded, but would add a whole extra level of complexity.
- Allow user to specify a whole range of customisations for individual pages – This would allow the most control, and allow the user to limit the amount of time spent downloading information, but suffers from the same problems as the previous option.
It was decided not to allow
the user any control over individual downloaded pages, because the extra
complexity is not justified in terms of the benefits it would
bring.
- Whether or not to store profiles for multiple users
- Do not store multiple profiles – This is the simplest solution.
- Store multiple profiles in the INI file – This would allow many different users to set up their own individual page preferences, and would be useful where a computer is being shared between more than one user. The extra complexity in terms of implementation would be quite substantial.
It was decided not to
allow multiple profiles because in most cases this feature would not be
necessary, and it would add a great deal of complexity to the
system.
5.2. Constraints Upon Design Decisions
Some of the design decisions listed above were constrained by external factors, such as time, and the author’s level of expertise in the area.
This section describes which decisions would have been taken differently in a perfect world where development was not limited by these constraints.
Ideally the system could have used an Internet Explorer COM object to implement the connection to the Internet and the downloading of pages. This would have provided a more flexible base from which to build the rest of the application. It would have cut down the amount of code that relied on specific versions of various windows components, thus making the code more future-proof. It would have made the application easier to adapt to changes in new versions of the HTML language by using the built-in functions of the Internet Explorer object to interpret the new features appropriately. It could also have made use of the built in functions of the Internet Explorer model to convert the web pages to plain text, although it does not really do as good a job as the parser outlined in this report.
It may also have been a good idea to use MFC to create the windows interface, as this simplifies the construction of windows applications, and makes links with COM objects easier to implement. It is also supported by a number of standard template libraries that aid in the development of systems like mine.
Unfortunately insufficient information about how to use COM objects in C++ was found, and it was therefore impossible to do so. There was not enough time to learn how to programme MFC applications, so this was also impractical.
The project would have benefited from a much more reliable way of making a dial-up connection than that used. Unfortunately, there was not sufficient time to allow coding of a more complex system. Getting access to a machine running Windows NT that had a MODEM also proved problematic. This precluded implementing any sort of automated dial-up networking for this operating system.
5.3. Development Method Employed
The method used when developing this project followed a spiral development model. It used a system of incremental prototypes, releasing new versions for testing every time a major new function was added. This enabled testing that each new function worked fully before adding any more functions. This simplified the debugging process, as any new bugs were usually limited to the new functions or problems with the integration of the new functions with the old ones.
Each time a new version was released for testing it was accompanied by a document containing “release notes” that listed new features that had been added, and known problems with the system. A copy of the most recent set of release notes is included in Appendix D. These release notes allowed anyone who tested the project to concentrate on the new features to ensure that they worked properly.
5.2. Constraints Upon Design Decisions
Some of the design decisions listed above were constrained by external factors, such as time, and the author’s level of expertise in the area.
This section describes which decisions would have been taken differently in a perfect world where development was not limited by these constraints.
Ideally the system could have used an Internet Explorer COM object to implement the connection to the Internet and the downloading of pages. This would have provided a more flexible base from which to build the rest of the application. It would have cut down the amount of code that relied on specific versions of various windows components, thus making the code more future-proof. It would have made the application easier to adapt to changes in new versions of the HTML language by using the built-in functions of the Internet Explorer object to interpret the new features appropriately. It could also have made use of the built in functions of the Internet Explorer model to convert the web pages to plain text, although it does not really do as good a job as the parser outlined in this report.
It may also have been a good idea to use MFC to create the windows interface, as this simplifies the construction of windows applications, and makes links with COM objects easier to implement. It is also supported by a number of standard template libraries that aid in the development of systems like mine.
Unfortunately insufficient information about how to use COM objects in C++ was found, and it was therefore impossible to do so. There was not enough time to learn how to programme MFC applications, so this was also impractical.
The project would have benefited from a much more reliable way of making a dial-up connection than that used. Unfortunately, there was not sufficient time to allow coding of a more complex system. Getting access to a machine running Windows NT that had a MODEM also proved problematic. This precluded implementing any sort of automated dial-up networking for this operating system.
5.3. Development Method Employed
The method used when developing this project followed a spiral development model. It used a system of incremental prototypes, releasing new versions for testing every time a major new function was added. This enabled testing that each new function worked fully before adding any more functions. This simplified the debugging process, as any new bugs were usually limited to the new functions or problems with the integration of the new functions with the old ones.
Each time a new version was released for testing it was accompanied by a document containing “release notes” that listed new features that had been added, and known problems with the system. A copy of the most recent set of release notes is included in Appendix D. These release notes allowed anyone who tested the project to concentrate on the new features to ensure that they worked properly.