August 2007 ~ Software4u (Software for you)

Thursday, August 23, 2007

The Rise of Real-time Linux

By Sven-Thorsten Dietrich

New real-time Linux enhancements open a whole new world of possibilities for Linux, ranging across the latest 3G technologies and as near as the mobile handset in your pocket. The purpose of modifying the Linux kernel with real-time functionality: to dramatically reduce interrupt and task preemption latency, thus enabling the 2.6 kernel for use in high-performance multimedia applications and those requiring extremely fast, task level reliable control functions. Real-time Linux has come a long way — where is it now and where is it heading?

Linux has survived many stages of its evolution as a disruptive technology. Today, it is alive, dynamic, and thriving on broad acceptance in a vast, rapidly growing array of application domains. Reactions to the emergence of Linux have ranged widely, from scepticism to outright hostility. Linux has proliferated within the open source community, and risen over time to become a standard by which many of its predecessors are now measured. Linux represents a well-tested, and reliable, open operating environment, which can be leveraged to rapidly create technology ranging from simple, battery-powered gadgets to complex, autonomous, AI-integrated motion control systems, like humanoid robots. Supply-and-demand economics today validates Linux as the operating system of choice for developers and integrators combined. Developers seek operating systems that allow customisation, provide high reliability, and support application longevity by design. Linux is continuing to meet and exceed these requirements in a growing multitude of applications. The Rise of Linux Before Linux became generally accepted as a full-fledged operating system, other factors contributed to its evolution. Open source collaboration is the root of Linux’s versatility, reliability and performance. The latter two are the easily quantifiable metrics that ultimately led to Linux’s proliferation throughout the technology industry. Many original Linux installations were Windows PCs, retired for newer, faster models. But the retired CPUs were useful running Linux, due to the efficiency and reliability of the fledgling Linux OS. Linux offered advanced networking capabilities and the applications to use them, long before Windows even supported TCP/IP networking. Linux outperformed the formerly inhabitant Windows OS in uptime and it became the ideal teaching and learning platform. A majority of the basic design decisions that governed Linux development have favoured fairness, progress and resource sharing. This trinity of design characteristics ensured that even heavily overloaded systems continued to make progress and did not drop network connections or starve applications. Fig. 1: Dynamics of Real-time Enablement for RTOS, GPOS and Linux The fairness- and resource-sharing design objectives have contributed to make Linux competitive and popular in the internet infrastructure, enterprise server, as well as application development segments. Companies such as Red Hat, Novell and SUSE have risen with Linux, and are offering competitive Linux solutions that are competing head-to-head with the offerings of internet giants like Microsoft, HP and Sun Microsystems. Fairness, progress and resource-sharing efficiency are thus essential to the evolution of Linux, and are endemic of the UNIX ecosystem. The Linux evolution has not ended with commercial viability as an alternative server, or desktop operating system. However, Linux is being swiftly and widely adopted as the embedded platform for innovative internet enabled applications. In embedded systems design, product feature requirements are broad, system complexity is high, and testing is crucial. Furthermore, overlooked defects in hardware or software can quickly be terminal to a product line. Embedded systems design pits hardware material cost, power consumption, and performance against one another, in an effort to provide maximum functionality in products competing in high-volume, low-margin markets. Consequently, embedded devices are typically designed with minimal CPU and memory to reduce cost and power consumption. Embedded devices are highly integrated for compactness. Packed with software functionality, they require a versatile, multi-functional, low-maintenance, real-time operating system (RTOS) for reliable operation. The OEMs and software vendors who are using Linux to develop embedded systems today are finding that their products are more reliable, versatile, and come to market faster, and at lower cost than competing products. Historically, Linux has fallen short in the time-critical response, or real-time performance category. Linux and its UNIX legacy are inherently NOT real-time operating environments. Far to the contrary, fairness- and resource-sharing stand in stark contrast to the requirements of embedded or time-critical applications. Rise of Preemption and Bounded Scheduling The demands of embedded system applications on the Linux environment have spurred improvements and design work throughout the Linux community. Memory constraints and complex software requirements in embedded systems, dictate heavy application reliance upon the vast functionality of the Linux kernel and its drivers. To meet the need for robust time-critical performance in the Linux kernel, MontaVista introduced the O(1) scheduler, along with the “preemptable kernel” concepts in Linux 2.4. Derivatives of these original implementations are found in the community’s 2.6 kernel today. The new O(1) scheduler in the Linux 2.6 kernel effectively imitates the behaviour of earlier Linux schedulers, while providing bounded-time task scheduling, which is essential for any time-critical applications. The advent of the 2.6 kernel series brought to mainstream many improvements and features evolved from embedded system requirements, along with the new scheduler and preemption option. Expectations on the real-time performance of the Linux 2.6 kernel are running high, based on the scheduler and preemption enhancements. Though the scheduler is performing well, response time in the early 2.6 kernel continues to be impacted by delays encountered when running tasks have locked critical resources within the kernel. Early Linux 2.6 kernels did not generally achieve reliable soft real-time performance, and fell far short of deterministic hard real-time processing. For audio processing, the 2.4 kernel series even outperformed the early 2.6 kernel’s. Additional improvements were necessary and they were possible. The broadened versatility of Linux in the embedded sector, combined with the constantly increasing demand for time-critical functionality, required re-examination of the original Linux design principles. Real-time requirements, like VIP treatment, are not generally compatible with fairness- and resource-sharing as in the sense of general admission. But even today, the emergence and handling of time-critical response characteristics in software design is poorly understood. Real-time Processing in the Mobile Handset The mobile handset is representative of the challenges faced by embedded system designers, who are under pressure to squeeze every “bit” of performance out of minimal hardware in the least development time. The design objectives for mobile handsets are typically governed by market pressure to offer the broadest possible feature set, provide a diverse array of add-ons and user customisation options, and provision the increasingly complex user interface with an easy-to-use, responsive feel. The phone user is directly affected by the effects of a short battery life, versus the physical burden of carrying a heavy mobile set. Batteries are a major contributor to the weight of a handset, and have to be minimised in size and weight. The mobile handset must operate with only the power in its necessarily small battery, and survive the longest possible interval between recharging cycles. All devices in the mobile handset, including the CPU, minimise the use of power. The CPU is a costly part, materially, but also with respect to power consumption. In accordance with the battery constraints, the smallest CPU parts available must be chosen for the design to maximise battery life. The mobile handset designer thus arrives at a conundrum. Choose too large a part, and the cost and power consumption will go up, consequently the profit and battery life will decrease. Choose too small a part, and the feature set will be limited, or the responsiveness of the GUI will suffer, audio will skip, or worse yet, the device might drop calls. In order to achieve a marketable compromise amongst competing objectives, the CPU must operate as efficiently as possible, and thus at a very high rate of utilisation. High utilisation implies that a large number of tasks could be competing for attention by the microprocessor simultaneously. High CPU utilisation and numerous tasks, in the presence of time-critical functionality, necessitate the establishment of real-time response specifications. These software timing requirements are commonly referred to as real-time requirements. In the example of the mobile handset, the must guarantee that the operating system gives precedence to the essential, time-critical tasks in the mobile handset includes: • Radio communications • Audio (video) capture and reproduction • Display output • GUI responsiveness The priorities are obvious in the handset application: not dropping a call, or clipping audio in either direction, and not missing updates of text, image or video data on the display. Finally, the user must not wait too long for an expected response after a key is pressed. Everything else is negotiable. Display of text messages, voicemail notifications, reminders, or an email can all be delayed by as much as a few seconds, without any perceptible performance degradation by the user. Fig. 2: Native Linux vs. other real-time architectures The Real-time Linux Kernel A real-time operating system must recognise assigned task priorities, and respond to time-critical events by switching to the corresponding tasks within a prescribed amount of time. The RTOS will allow all other system tasks to run whenever there are no high-priority events pending, but its performance should not degrade with numerous active tasks. Preemption When a time-critical event occurs in a real-time system, the task responsible for processing the time-critical event must be scheduled, after the currently running task has been suspended. Suspending the currently running task is known as preemption. Preemption can be delayed if the currently running task is accessing shared system data. Shared data must be in a stable state before tasks are switched; since another task (or another CPU) may need to access the same shared data. For this reason, shared data is protected to guarantee that only one task will operate on it at a time. The protection establishes a “critical section”. In the Linux-2.6 kernel, preemption is disabled while a shared data operation is underway in a critical section. When preemption is disabled, the scheduler is not allowed to change tasks, and therefore the running task is guaranteed to have no contention while in the critical section. To further compound the problem, if data was shared with an interrupt handler, interrupts also have to be disabled, along with preemption, rendering the system entirely unresponsive while executing in that critical section. Efforts had been underway for years, to reduce the number and duration of critical sections in the Linux kernel. Instrumentation was developed by MontaVista to identify the longest critical sections in the 2.4 kernel series, and it is applied to the 2.6 kernel. There were over 10,000 critical sections in the early 2.6 kernel. Modifying them all to conform to a maximum execution time boundary was a daunting, if not impossible task. However, there was another way. The critical sections could be protected by an alternative mechanism, known as a Mutex. The Mutex would permit the kernel to allow preemption while executing in most critical sections. Since the kernel was already preemptable, when tasks executed outside the critical sections, Linux would become fully preemptable. Real-time Linux. MontaVista developed a prototype and the performance was impressive. The complexity of the problem was great and those following the action realised that several others outside MontaVista were working on the same problem. One constraint of the new design was that preemption could not be disabled before a mutex was locked. However, preemption had to be disabled at certain places and mutexes are nested in the kernel. The combination of nesting and disabled preemption regions turned out to be something like a moebius strip. It was clear that some existing code had to be rewritten. The decision was made to go public with the prototype and ask for feedback from the community. In response to the announcement, Linux developers already working on voluntary preemption, spent a week on an epic programming effort. Combined input from those who had formerly been working on the problem independently, completed the conceptual real-time Linux kernel only 6 days after MontaVista announced the prototype. Development has continued at a frantic pace, and substrate definitions required by the real-time patch began to appear in the Linux release candidates almost immediately. The real-time concept radically changes the operation of the Linux kernel internals. Like the O(1) scheduler, it effectively imitates the behaviour of earlier Linux kernels, and does not alter the kernel's behaviour towards user applications. Most drivers work flawlessly under the new model, and those who fail, seem to deviate from implementation conventions in unusual ways. Overall, the real-time effort is helping to expose existing problem areas in the kernel, stimulating discussions about efficiency and optimisation, in addition to guiding development efforts towards a continually higher standard of implementation and performance for the evolving Linux ecosystem. The new real-time enhancements in Linux open a world of new possibilities for Linux, ranging further than Mars Rovers and roaming as near as the mobile telecommunications in our pockets. Real-time requirements will never be alleviated by improvements in hardware performance, since the rapidly growing demands on software applications virtually promise that hardware resources will always be utilised to the limits of their capacities. Continually, more demanding real-time requirements are guaranteed as a consequence of increased audio and video streaming to and from mobile, and internet connected devices. Real-time Linux also creates development opportunities for the migration of carrier-grade functionality from plain old telephone switches to cellular base stations, and spanning the gaps between domestic set-top box networks to internet routers. Media systems and data switching carriers using Linux today already have real-time requirements, to reliably process video and audio streams. Similar streaming and bandwidth promise requirements are increasingly imposed on internet routing, cellular and telephony equipment, as the various technologies are integrated into a seamless, media streaming network system. Real-time functionality on the host operating systems and carrier equipment assures seamless integration of the near endless variety of multi-functional clients, routers, and transmission links requiring quality-of-service guarantees.

J2EE and .NET: Presentation Tier Interoperability

7:47 PM Articles No comments

By Binildas C.A.

In this article, Binildas C.A. sheds light on this important, yet sparsely documented, area of interoperability considerations. He also discusess about some of the solutions available to work hassle-free with .NET and J2EE. The article converges at a point, and rightly so, to propose web services as a technology for attaining client side interoperability, and in that attempt, addresses two important approaches: top-down and bottom-up. As proof of the proposed technology, practical demonstration of the two approaches, and the successful implementation of a solution, the writer showcases a sample architecture based on BEA WebLogic Server and .NET C# client.

Java 2 Platform, Enterprise Edition (J2EE) is a mature middle tier technology that has become the norm for building production quality enterprise applications. Scalability, reliability, extensibility, security, etc. are few main operating qualities which give J2EE its edge over competing platforms. Though it is comparatively easier to develop, deploy and maintain applications in the J2EE stack, many a times the J2EE components need to interoperate with other technologies. Recent trends in development have witnessed Java/J2EE’s rising popularity in enterprise class server side application development from its historical stronghold in client side applets and applications. In assessing different solutions like C and C++, and other object-based programming languages like Delphi, Java/J2EE have been found the most stable and proven. Microsoft’s Integrated Development Suite (IDS), composing its IDE, Visual Studio, along with a tight integration of programming languages like Visual Basic and Visual C++, has made this development environment the best for fast-paced application development. ATLs and MFCs add to this suite. When evaluating for use in native applications that require speed and richness, these technologies are still preferred. Microsoft’s .NET offerings that work across Windows 3.1 through Windows XP offer enhanced power to the developer desiring to implement an interoperable solution. In this article, we will combine the richness and performance of .NET with the stability and maturity of J2EE, shedding special focus on client side interoperability with examples deployed in BEA WebLogic Server 8.1 and .NET platform respectively. Requirements to Deploy and Run Examples Here is a list of requirements for deploying and running the examples in this article: • Windows XP • BEA WebLogic Platform 8.1 SP3 • Microsoft Visual Studio .NET Enterprise Architect Interoperability in the Presentation Tier Irrespective of advancements in the server side platform, an end user is interested in availing a service from the point of usability. A service, for example, can be accessed by a partner’s system, browser, or even thick client built using another technology. It is when this service system becomes open that a series of aspects need to be considered. Interoperability Considerations In order for two parties in different technologies and platform to coordinate, both need to agree on a set of protocol aspects. This is because different platform stacks have their own representation of service and data. We need to normalise or mediate between these heterogenous representations to enable the desperate parties to communicate. These are major aspects to consider: • Data formats: When evaluating a data format, take note of the differences across various platforms. For example, an int in Java occupies 32 bits, and 16 bits in C++. Unicode representation is a blessing when it comes to representing characters, since it is possible to represent all character sets using its 2 byte representation of char. Hence, if the same value is represented differently by different platforms, how can we integrate applications in different technologies? • Transport channels: Transport channels can differ when systems try to interoperate. When HTTP is the accepted transport mechanism, another application can well send messages through an SMTP channel. What is the normalisation mechanism between these disparate channels? • Network paths: Interoperating across companies and trading partners are not easy since corporate firewalls block binary protocols. This is perhaps one primary reason for the lack of popularity with solutions that use Java RMI or Microsoft DCOM. • Hardware platforms: Though hardware layers are well shielded by the abstractions above it, it is worth paying attention to the differences in WORD representation between different hardware. Most 32-bit client side machines today are being replaced by 64-bit processors. Perhaps good news that “long in Java” no longer needs to be externally synchronized. A read or write operation in a long type in Java takes place in two instruction sets in a 32-bit processor!. Apart from those aspects mentioned, there is the next level of considerations: • Performance and standards compliance • Manageability and reliability Interoperability Solutions Interoperability is closely linked with distribution. If there is no distribution, there will be no problem of interoperability. Although not an axiom, this reflects a normal case. Let us briefly discuss the available solutions: • COM/DCOM: Microsoft Transaction Server (MTS) with its COM/DCOM protocols is a solution for distribution and interoperability. However, MTS is proprietary and rests on inelegant Microsoft extensions to C++ and other languages with COM/DCOM bindings. • CORBA: CORBA was one of the earliest solutions which is still acceptable, even though it is not so popular. Even though CORBA is powerful, it is not popular due to its complexity, and most times, it requires the effort of many experts to solve the intricacies between ORBs. • RMI/IIOP: While RMI cannot be considered an interoperability protocol, RMI/IIOP is interoperable even though the deployments are a handful. Moreover, IIOP is not a Java-specific protocol. • .NET solutions: .NET aids interoperability with its support for remoting. The advantage here is that the channel used for remoting is easily configurable. • Proprietary solutions: Apart from solutions mentioned above, there are many proprietary ones for interoperability. Since we are not including these “proprietary solutions” along with open solutions, we are not going to consider them as true interoperable solutions. • Partly open, and ad-hoc: The best method of implementing a partly open solution is by leveraging the already available open transport channels and sending data in an open format. Since HTTP is already available as an open channel, and XML is all about open data, combining these two will provide an ad-hoc solution. • Web services: Web services promise true interoperability. This is done by standardising and abstracting most aspects of a service interface in the form of a Web Service Description Language (WSDL). Using Web Service for Presentation Tier Interoperability Web service is the best interoperability solution available today, unless performance or other key considerations warrant a binary solution. As this article focuses on interoperability between a J2EE server and .NET client, a discussion on the range of variations available within the web services programming model is beyond our discussion scope. Steps for Interoperability To ensure interoperability, we need to concentrate on the following aspects, amongst other items: • XML Schema for describing data • SOAP for formatting the message • HTTP for transport protocol • WSDL for description • UDDI for discovery The primary specifications upon which web services are based are too open and broad. That opens up the scope for a lot of ambiguities once it gets implemented as a web service solution. To reduce these ambiguities and achieve interoperability, web service vendors formed a consortium called Web Service Interoperability Organisation. This standards body drafted their first specification called the Basic Profile 1.0 (BP 1.0). This profile aims to bring about a set of conformance rules that clear up ambiguities in the above standards. Even though no one is required to adhere to BP, doing so will make interoperability less painful. There are primarily two approaches in integrating a .NET client and a J2EE server – top-down and bottom-up approach. Let us briefly explain the difference between these two: • Top-down Approach Although not usable across all cases, the top-down approach is the most conventional and straightforward. Experience has shown that there are cases where we cannot extend this approach because what works for one web service stack may not work for another. This is precisely the motivation behind the bottom-up approach. In the top-down approach, similar to CORBA where we start from an IDL, we start from the WSDL in web services. The service description is fully described in the WSDL. Any client can use this WSDL file and follow any of the different binding options available. By binding options, we mean either statically generating client side stubs and binding at compile time, or generating dynamic proxies at run-time by introspecting the WSDL file on the fly. The main problem with top-down approach is data compatibility. Standards are evolving, and it takes some time before the latest standard is reflected in implementations. Even then, it will not happen exactly at the same point in time. This gives the notion of the common denominator, which is generally the super set of all data types supported by “all the implementations in consideration”. It is highly important that we stick to this super set whenever we design interoperability solutions. With this design approach in mind, we need to architect for either of these approaches. To solve this problem, we should look at exchanging only primitive data types, such as strings, integers and so on. All web service stacks support these data types, providing the greatest levels of flexibility and client access. In this approach, the service definition exposes what it mandates, and it is up to the client run-time to agree to the service format. • Bottom-up Approach In the bottom-up approach, the client has to agree and bind to the service definition, but in defining the service bindings, we will do a roundabout way. One scenario is when we need to send a very complex data structure, which we suspect will be supported by all client run-times. In such case, we package the data as a string. The best solution here is to populate a string with an XML representation of the data. This XML encoding should follow an agreed schema. This again implies we start from an “agreed schema”, set up the service implementation, and then expose the service description. The client may have some or full idea about the “service types” involved. Most of the times, such approach will result in a kind of document-oriented web service. Vendor tools can again help in automating this process. Most relevant steps to follow include: • Create schema that represents the XML form of the data • Technology-specific data classes based on the above schema • Technology-specific serialisation/de-serialisation class that carries out the above translations Client applications that consume the web service need to understand the formats and valid values so as to process the service return types. Architecture for Presentation Tier Interoperability In this sample architecture, we will design for both scenarios above. We will use the same application available in the examples folder in the WebLogic installation, and expend it for both scenarios. We will have three methods in our service interface, as shown below: public interface Trader extends EJBObject{ TradeResult buy (String stockSymbol, int shares) throws RemoteException; TradeResult sell (String stockSymbol, int shares) throws RemoteException; String getAllProducts() throws RemoteException; } The notable point in this interface is that TradeResult is a complex type. When such complex type is visible in the service interface, it eventually means that the client needs to understand this complex type when it binds to avail the service. Web service standards are mature to the extent that this level of interoperability is possible in many cases. The third method does not have any visible complex type. In fact, the return type which is a simple string is going to be a flattened representation of a more complex type. The former two methods (buy and sell) are examples for the top-down approach, and we will build the third method (getAllProducts) in the bottom-up approach. The service will be implemented as a stateless session bean in the WebLogic J2EE platform. The client will be built in .NET using the C# programming language. Fig. 1: The Sample Architecture While the first two methods are simple, let us explore the rationale behind the third method. In fact, the third method is intended to return a collection of Product complex type. Perhaps, this type too is not that complex that we can return them similar to TradeResult as with the other two methods. To demonstrate the concept, let us assume that the collection of Product complex type is not simple, and we have taken an architectural decision to flatten this collection as an XML string. For this, let us quickly list down the steps to perform: • Agree on a schema for the complex collection type • Create the XSD-based data types on server and client ends • On the server, convert Java data types based on XSD to XML representation and serve • On the client, access service as an XML string, and based on XSD, convert them to .NET data types and continue working with them Implementing the Demonstration Deployment of the server component and exposing them as web service is easy. Follow WebLogic Server Web Services programming examples referenced in Links & Literature for more details. One point which needs to be mentioned here is the conversion to XML based on XSD. As usual, the data for Product types might be streamed from a database. Once the data and XSD are available, we normally need to use any available automatic methods to convert data to XML format. Many Java XML products can map a field in the class to an element or attribute in the XML Schema. Sometimes, existing classes map to the correct schema without requiring an intermediary. If this is the case, we can create mapping files for existing classes, as this removes the step of taking data from existing Java classes into Schema-based Java classes. Since this article has intended to show the architectural building blocks for attaining interoperability and not demonstrate various implementation mechanisms, this part is intentionally omitted by hard coding the XML structure with data in the server side implementation. In the client side, we have two components – EJBClient and DataClient, both implemented in .NET. The EJBClient will avail the buy and sell services while DataClient will access the getAllProducts service. To build these clients, we first need to generate client side stubs for the web service. The commands used are displayed in Fig. 2. Fig. 2: Generating client side stubs These commands will generate TraderService.cs (C#.NET files). Now, coding the EJBClient is simple. By executing EjbClient.exe, we can invoke Buy and Sell. Fig. 3: EJBClient in .NET The DataClient is designed to show the XML data retrieved from the server in a DataGrid control. This is done with only two lines of code: products1.ReadXml(memStream); dataGrid.SetDataBinding(products1, "Product"); where products1 is a typed DataSet in .NET, and memStream is a MemoryStream type, which is an in-memory representation of the XML data. Even without generating a typed DataSet, it is possible to bind the XML data using the two lines of codes and display it in the format shown in Fig. 4. Fig. 4: DataClient in .NET A Typed DataSet is easily generated from the XSD, which helps especially when working with the DataSet to do type-safe coding, using the command in Fig. 5. Fig. 5: Generating Typed DataSet Before building the server, do make necessary changes in the build.xml file, corresponding to the line: Summary The article described the integration of .NET technology in the client side to access web services implemented in BEA WebLogic platform. It walks readers through architecting a solution for client side interoperability, following either the top-down or bottom-up approach. The code examples demonstrate both J2EE-based server implementation and .NET-based client side bindings. The tail piece helps choose best of the breed solutions across technologies for different problems at hand. It also prompts readers to follow the most appropriate architectural steps for attaining the required level of interoperability.

Software Product Documentation

10:40 AM Articles No comments

The ideal software would be free of errors and so easy to use that everybody would be familiar with it the minute they start the application. However, this is not the case in real life.

Besides the quality of the software product, there’s something else that makes or breaks the deal: technical support. The better the support software publishers and shareware authors provide, the more users are likely to buy the product.

Technical support for software products can be provided in several ways:

online product documentation
e-mail assistance
access to support forums maintained by software publishers
knowledge bases.

Good documentation may exclude in many cases the need for further forms of technical support. It is, however, not easy to write. One of the reasons why this happens is that it is difficult for shareware authors or other software developers to put themselves into the users’ shoes, since they are already thoroughly familiar with the application.

Read Me file

The first thing every software product should have is a text format "Read Me" file that includes the following:

Product Name and Version
Ship date
Company and Author Name
Description (like "photo organizer")
A What’s New list (this should be a list of fixes and new features)
System requirements (hardware like CPU, RAM, disk space, operating systems supported)
Price, payment options and instructions
Copyright and distribution information (rules for people who want to distribute your product)
Contact details (email, phone, fax website and postal address)

The "Read Me" file is important because everybody that might be interested in your product is expecting it, including reviewers, users, or people who want to put your product on a CD for distribution with a magazine or on their website. The idea here is to minimize frustration associated with the information being too difficult to find, therefore you should put all in one place.

The Manual

The other thing shareware authors and software publishers need to provide with their product is the manual. The first thing you should think about when starting to write it is how your users are going to use it. Few are those that will bother reading it all before attempting to use your product. They are more likely to turn to it later, when they try to do something and cannot figure out how to do it, or when they find something they do not understand.

To help them, it is best to organize your documentation by tasks. "How to…" sections are more useful than merely documenting every command in order. Explanations are easier to understand if they are backed by pictures and diagrams, wherever possible. There should also be a chapter labeled "Troubleshooting", which provides answers to the most common problems. At first you will have to guess where those problems may occur, but a couple of upgrades to your product later, the feedback from the people who tried the product will tell you what the most common problems are. The manual should be broken down into chapters, the first of which telling what the other chapters contain, so that people could readily find what they need.

The interface

Another point worth mentionning is the interface. User friendly is not enough, the interface needs to be navigatable even if the user does not have an overall systems understanding, so screens need to be self-describing.

Online help or online documentation

There is also online help, or online documentation. The documentation put on the website for users to read has to show up in search results. It has to be organized in such a way that users, who are sure to be asking questions like "How can I…" or "Why can't I…" can find the answers quickly. Lots of examples showing how to do various tasks with the software product are also needed. A good idea is to include links to a glossary of terms that might be more difficult to understand.

Always bear in mind that anyone who has had a bad experience with a product tends to remember it for a long time and software is no different. That is why you should strive to make using your product as smooth an experience as possible, something proper support and documentation can help you achieve.

Best Practices for Risk Free Deployment

10:38 AM Articles No comments

Overview

The cost impact to a company of a failed project can be severe indeed. The impact on the reputation of the project manager can be disastrous.

Software project management is not easy, and it requires considerable skill to successfully manage the many different risks that conspire to de-rail a project:

Numerous methodologies are available for mitigating these risks – PRINCE2, RUP, DSDN, eXtreme programming – and these have helped to some extent.

This document introduces the 3D™ methodology – a set of best practices and quality tools developed by BuildMonkey, which can be summarised as.

De-risk. Deploy. Deliver.

In any software project, deployment is a milestone on the project plan that is usually associated with payment – or staged payment. Through the course of development, any number of problems can arise to derail efforts to reach this milestone.

The 3D™ methodology and supporting toolset is based on many years of experience at the sharp end of projects, understanding what has worked and what has not, and the lessons learned from each.

Competent practitioners, and experienced project staff, will find resonance with many of the contents of this document and may find themselves saying “this is just common sense”. This is certainly true, but the main problem with common sense is that it is not as common as people think it is.

This document, and the 3D™ methodology, is an attempt to bring that common sense together in a single location, as a coherent set of best practices supported by proven tools to help you to release on-time, on-budget, with no defects.

The Problem To Be Solved

No methodology has yet focused on the component that all development projects share – the build.

One of the reasons for this is that the term “build” is interpreted differently by different people:

The development team sees it as compilation and assembly;
The integration team see it as the bringing together of all of the components in the application in a format suitable for release;
The deployment team see it as something which produces the artifacts that they have to install and configure;
The testing team see it as something which produces the artifacts that they have to test;
The Project Manager sees it as an opaque step that nobody is entirely responsible for;;
The end customer should not see it at all;

The BuildMonkey view is that the build is the combination of processes and technology that take software from design to deployment – _where the return on investment starts to be seen.

It is clear that a methodology is required to de-risk development projects and to standardise use of the term “Build Management”.

Best Practice: “Build Management” encompasses everything from compilation, all the way through to release to the customer.

No Release, No Revenue

Any Finance Director knows that development is just an activity that needs to be tolerated in order to produce something that will deliver a return on investment.

It may sound strange, but a large number of software developers do not appreciate and embrace this basic truth. This is in part due to their closeness to the application being constructed.

A common problem faced by development projects is therefore that it is the software developers who manage the build. This creates a situation where the build is focused on the needs of development, and is not geared towards releasing the output of coding such that business value can be realised.

Build Management should therefore focus on the end result of development – a return on investment – and ensure that all of the inputs to the process are incorporated in pursuit of this goal:

Best Practice: Focus on the end, and accommodate the inputs

Software Tools Are Only Part of the Answer

Software projects are a complex set of interdependent people and teams and can be likened to a convoy of ships. A convoy has to move at the rate of the slowest ship. Increasing the speed of a single ship in the convoy will not increase the speed of the convoy – it will simply increase the amount of wasted capacity in the speeded-up ship.

Speeding up the slowest ship will, however, have a positive effect since the whole convoy can now move faster.

Many Project Managers try to improve productivity by implementing some degree of automation in development projects – particularly in the area of the build – and often purchase “magic bullet” build software that provides this.

Simply using automated build software does not improve productivity any more than the example above improves convoy speed - as it only increases the speed of a single ship in the convoy.

There is no point in speeding up development, if the target production infrastructure cannot keep pace – this just increases the inefficiency. A lot of organisations make this mistake – highly agile development processes trying to feed into considerably less agile deployment processes. The result is inefficiency, waste and over-run.

Before considering using an automated build tool it is essential to ensure that the inputs to, and outputs from, the build can cope with the improved build speed. It is imperative to ensure that the processes and technology employed are geared towards taking the project to a successful conclusion – on-time and on-budget.

Best Practice: Don’t rely on software tools alone, they may solve symptoms whilst creating problems elsewhere

Configuration Management Best Practices

Software Configuration Management (SCM) is a relatively mature discipline with much written about methodologies and techniques, and these will not be recreated here.

We will focus instead on leveraging the SCM repository, and the facilities that it offers, to further the goals of the project rather than to consider SCM in its own right.

It All Starts Here

The SCM repository - how it is managed and used – is the keystone of good build management and successful delivery of projects.

The SCM repository is the slave of the project, not the other way round. It should be solid and reliable, yet flexible enough to accommodate the needs of new projects. Project Managers should not have to retrofit their planning to accommodate an inflexible SCM setup.

If used correctly, the SCM repository will enhance productivity, and minimize risk, through being able to provide exactly what the project – and project management – require. If used incorrectly, it can cause delay and slippage through having to do things inefficiently further down the chain.

Best Practice: The SCM repository is the slave of the project, not the other way round.

Configuration Management is not Version Control

Most software developers regard the SCM repository as a massive storage area where they simply check versions in and out – a common cause of problems.

Simply checking things into an SCM repository is not Configuration Management any more than karaoke is opera.

A well-maintained SCM repository is so much more than version control, and should provide:

The ability to recreate any identified baseline, at any time;
Meaningful statistics on what is changing, when and by whom;
Management information, such as “how many new defects were introduced by the refactoring of component ‘X’?”

In order to be truly effective in a project, the SCM repository should store all of the artifacts that form part of a baseline or a release.

Source Code

Most development projects simply store the code that is being developed and their use of the SCM repository is no more sophisticated than this.

Data

Most applications nowadays are not just source code. Take the example of a modern computer game – the vast majority of the code base is made up of artifacts other than code such as audio clips, pictures and movie clips.

Database Structure and Contents

Where an application relies on a database this database will have a schema and structure that may change from release to release – this schema must be captured.

There will normally also be seed data for the database which should be considered as part of the baseline.

Application Configuration

In a large distributed application, the software being developed will sit atop a number of pieces of software (e.g. application servers, web servers and message queues).

The configuration of these underlying applications have an effect on the quality – or otherwise – of the software being developed and should, therefore, be considered part of the baseline for a release.

Environment Configuration

The underlying environment and infrastructure is a key component of the project, particularly in the case of distributed applications.

Such banal considerations as DNS zone files, user account information and system parameters have to be considered as some of the moving parts which affect the application and therefore be placed under configuration control.

This is of particular importance when there is more than one environment involved in the project (e.g. a development environment and a separate test environment) since the question of “are these environments the same?” crops up again and again.

Best Practice: Everything that can be changed, and affect the behaviour of the application, is a candidate for configuration control

The Point of Labelling

It is a common misconception that simply applying labels, or tags, to the SCM repository creates a baseline but this is only partly true without corresponding records of:

What label has been applied;
When that label has been applied;
Why it has been applied (i.e. what milestone, or other external event, the label is associated with);

The use of a naming convention for labels can deceive even further. For example, a project that uses a date-based labeling convention (dd_mm_yyyy) will have several labels of the form (03_05_2004, or 09_06_2004) and will reasonably assume that they have some kind of record of the baseline on those dates.

But what was happening in the project on those dates? Was the 03_05_2004 label applied immediately before a release to test, or immediately after?

Best Practice: Labels should be used to identify and inform about events in the project

Don’t Label If There Is No Point

This may seem like stating the obvious, but there should be a reason for a label being applied – the whole purpose of labeling is to identify some event in the development cycle that may need to be re-visited.

To this end, labels can be divided into two categories:

Point-in-time
Associates particular versions with a particular calendar date, or other event that is fixed in time (e.g. MONTH_END_JAN_2004, or RELEASE_1_0_1);
Point-in-process
Associates particular versions with events in the project that may recur at a later stage (e.g. LATEST_RELEASE, or CURRENTLY_IN_UAT);

Best Practice: Every label should have a point, whether point-in-time or point-in-process

Management Information

The job of the Project Manager, ultimately, is to bring the project to a successful conclusion. If this were an easy task that happened by default, then there would be no need for a Project Manager.

In order to be able to do this job well, a Project Manager needs information. He needs to know what is going on in the project – who is doing what, who is working on which components, and a wealth of information can be obtained from a well-managed SCM repository:

What is changing in the environment – what has changed since a given point in time or how often particular elements are changing;
Who is changing things in the environment;
Why things are changing in the environment;

Of course, the final item in the list requires that committers are using descriptive comments to indicate why they are marking a particular change. A well-managed SCM repository should enforce this.

Best Practice: The SCM repository should provide meaningful, and accessible, management information

Build Best Practices

As explained at the beginning of this document, the term “build” means different things to different people. The most common interpretation is the one used by developers, where the term “build” describes the compilation and assembly step of their development activities but this narrow interpretation is a common cause of problems and over-runs, on development projects.

Building is not Compiling

At the outset of the project, the Project Manager will ask the question “how long to set up the build?” and a developer – thinking of compilation and assembly – will answer something like “1 day” – a task and duration which is then duly marked on the project plan and development begins.

Later in the project, when it is time to start deploying and testing the application, this “build” needs to be refactored to accommodate the deployment and test tasks. In doing so, it turns out that the way the application is being assembled is not conducive to it being deployed or tested correctly – so the compilation and assembly staged need to be refactored as well.

In the meantime, the development team sits on its hands whilst the “build” is refactored to accommodate the needs of the project – valuable time is lost whilst the deadline continues to advance.

Best Practice: Know what will be required of the build before starting to write the scripts

Don’t Throw Out The Baby With The Bathwater

From a build perspective, projects with similar architecture (both in terms of the team and the application) will have similar attributes. There will obviously be some changes required, but these will tend to follow the 80/20 rule to a large degree.

For example, a web application that is being developed by an in-house team and that will be deployed to a Tomcat servlet container and Oracle database will follow largely the same steps and require largely the same deployable artifacts.

A good SCM repository will enable the latest versions of boiler-plate build scripts for such an application to be found. These can be used almost off-the-shelf – meaning that development can start apace without having to wait on the build to be constructed for similar applications .

Best Practice: Well-crafted builds are re-usable and should be re-used

The Architecture Drives The Build<

Following on from the previous section, it should be clear that the architecture of what is being developed – and the structure of the team(s) developing it – will dictate how the build should look.

There is little value to be gained in trying to retrofit build scripts for a computer game (developed by 12 people all in the same room) into a project to produce a large J2EE application with development occurring at six different sites around the world.

Best Practice: Well-crafted builds are flexible, but a “one-size-fits-all” approach can be costly

Management Information

There are a number of people who need information that the build can provide:

The Project Manager needs to track overall progress against defined milestones – number of outstanding defects, whether a committed release date will be met etc;
Development leads need to be sure that the code is of the quality that they require – test reports, bug analysis patterns, code metrics, document and UML generation etc;
The deployment team need to know that the artifacts will work in their target environment, and that the environment is as the application expects it to be. They also need to know whether two (or more) environments are “the same”;
The test team need to have confidence that they are testing against a known baseline, and whether defects that they see have been rectified in development (or whether they are re-appearing after already being fixed);
Everybody needs to be able to communicate effectively using the same language, and have a common terminology for release versions – particularly if there are multiple threads of development;

A good build infrastructure will provide all of the above information, and more besides.

Best Practice: The build should tell all project participants what they need to know

Deployment Best Practices

Considering that it is generally an important milestone on a project plan, normally resulting in payment or a staged payment, deployment is one of the most overlooked areas of software development.

The normal course of events is:

Release artifacts are created;
Some installation and release notes are cobbled together in the form of a README;
The deployment team work frantically to install and configure the application – the testing team (or, worse still, the customer) are idle and unproductive in the meantime;
Some symptoms are found which are suspected to be application defects;
The development team blame the environment;
The deployment team blame the application;
Repeat (5) and (6) ad nauseam.

When a documentation team are also considered - responsible for creating documentation that the end user will need to install, configure and use the application – the situation becomes even more difficult.

This situation can be avoided by planning for deployment from the beginning. Deployment is an inevitable part of software development, yet it always seems to take people by surprise.

Best Practice: Know that deployment is inevitable, and incorporate it into the automated processes

Deployment Is Not Installation

As part of normal development activities, artifacts are installed into sandbox environments – and test environments – many times. But this is not deployment, this is installation.

In order to get an application into its production environment, be that an internal environment or on hosted-infrastructure, a number of hurdles must be overcome:

The application must pass UAT;
The application must be installed and configured correctly:
All pre-requisites for the application must be satisfied;
The end customer must accept the handover;

Deployment is that point in the life of an application where it starts to produce a return on investment. “Launch”, “Go-live”, “Release”, “First Customer Shipment” are all phrases which describe the same event.

Best Practice: Deployment is the point where an application starts to provide a return on the development investment.

The Environment Is a Refactorable Component

This point cannot be stressed enough, particularly in large distributed applications.

Every application, large or small, has a runtime environment in which it operates. In a simple desktop application, this is a standalone machine (such as a PC). In larger applications, this will be a combination of machines (e.g. an application server and a database) operating together to provide the runtime environment.

In either case, the application expects certain facilities to be available from the runtime environment and will function incorrectly – or cease to function – if these are not present .

The environment itself, whether standalone or a network, contains many moving parts that can be independently configured. IP addresses, or hostnames, can be changed. User privileges can be modified or revoked. Files and directories can be removed. Each of these can have an effect on the way that the application behaves.

In an environment that is owned and managed internally this can be bad enough. In an environment that is owned and managed by an external third party, and where project success is contingent upon successful UAT in that environment, this can be disastrous.

Best Practice: Be able to identify whether the deployment environment is as prescribed, and “fit for deployment”

Environment Verification Testing

One of the most common questions that arises in development projects containing more than one environment is, simply, “are these environments the same?” and its answer can be elusive.

It is essential to be able to answer that question – quickly and accurately – so that any perceived defects in the application can be categorised as “defect” or “environmental”.

This ability becomes particularly poignant where on or more of the environments are owned by different teams, or organisations.

Best Practice: Be able to prescribe what the deployment environment should look like, and have a capability to test it quickly.

Regression Testing

The environment, as explained earlier, is a refactorable component. It can be changed, and parts can be moved or deleted. However, unlike application code, changes may need to be made to the environment in response to external events (e.g. hardware failure, or security policies).

Applications, particularly complex ones, use regression tests to ensure that observed behaviour after a change is exactly as it was before the change was made. The same should be true of the environment.

Best Practice: Automated regression tests for the environment that will compare observed behaviour both before and after changes are made.

For example, suppose that a number of operating system patches or service packs are applied to an environment where the application has been, or will be, deployed. How are these tested? Do you wait for users, or testers, to start calling to say that there are problems?

Or do you make sure that you know what problems have been introduced before your users do?

Configuration Management

As stated earlier, the SCM repository should be used to store any artifact that can be changed and that may have an effect on the environment.

It may not seem obvious, but some of the most obscure environmental changes can cause an application to fail:

Hostname resolution;
Non-existent user or group accounts;
IP and network connectivity;
Existence, or otherwise, of files and directories ;
Application or operating system configuration files;

It is essential that these environmental variables be placed under configuration control and able to be identified as part of a baseline.

Best Practice: Environmental artifacts that are not part of the application should be part of the baseline

Automate, Automate, Automate

Every single task that is performed as part of a development project – throughout the entire lifecycle – can be placed into one of two categories:

Tasks which require some form of human judgment;
Tasks which do not;

Tasks which fall into the first category can use some degree of automation, but should stop and wait for human intervention wherever judgment is required.

Tasks in the second category should be automated. There is no value in having expensive resources employed to do mechanical or repetitive tasks that a computer could do more quickly, accurately and consistently.

Best Practice: Automate anything that does not require human judgment

A note of caution - it may be tempting to think that automation will increase productivity on its own, but this is not necessarily the case. Automating an inefficient process will simply magnify its inefficiency – as explained in the section on Software Tools are Only Part of the Answer.

This, and it is worth repeating, is a common error – to assume that automated build software alone will improve productivity.

Best Practice: Do not automate inefficient processes, or you will only maximize the inefficiency

About BuildMonkey

BuildMonkey is a division of Nemean Technology Limited - a leading technical innovator specialising in Agile Deployment and build management.

We have over a decade of experience in helping clients increase productivity in their build, integration and test cycles. We have a track record of massively reducing costs and defects through process automation and proven methodologies.

Formed in 1999, and profitable every quarter, we invented the discipline of BuildMaster - we are the original and the best.

We provide specialist Build Management products and services - when all you do is Build Management, you get pretty good at Build Management. Our world-class Professional Services staff are the best in their field, complemented by original software products of the finest quality.

We aim to be the leading provider of Build Management products and services in the UK by providing excellent service to our customers, and by empowering Software Project Managers to aim for release on-time, on-budget, with no defects.

About the Author

John Birtley has been a specialist in Configuration and Build Management since 1994 with a focus on deployment - particularly the packaging and installation of developed applications. He has worked for a number of blue-chip organisations including Sun Microsystems, Hewlett Packard, Sony and Ericsson in the transition from development to deployment. BuildMonkey was launched in January 2004, to package and commoditise this cumulative experience under the title of "Agile Deployment". He lives in Southampton, where dogs and children conspire to take what little spare time he has.

Sorting and Searching

10:36 AM Articles No comments

In this month's post-summer holiday column I cover two important topics for Makefile builders with a round up of techniques for sorting and searching inside Makefiles and on the file system.

Sorting a list

Sorting in a Makefile is relatively simple: GNU Make provides a $(sort) function to sort a list. But there are a few of gotchas:

1. The $(sort) both sorts into order and removes duplicates from a list. There's no option to just sort while retaining duplicates.

For example, sorting a b a a c with $(sort) returns a list with three elements (and just one a):
$(warning $(sort a b a a c))
which outputs:
Makefile:1: a b c
More on this in the next section.

2. The actual sort order is defined by the return value from the strcmp C function, which in turn will depend on your operating system and locale collation order. This means that you can't rely on the sort order being identical if the Makefile is used on different operating systems (or even different instances of the same operating system).

3. The sort order is always lexical and there are no options to ignore case, or define another ordering (such as numeric).

If GNU Make's built-in $(sort) function isn't up to the list sorting task that you need then the solution is to use the shell's sort program and call it through $(shell). To use the shell's sort program the list first needs to be broken up into a sequence of lines, then it is passed to sort with the appropriate options, and finally the list is reconstructed. Happily $(shell) will automatically do the list reconstruction for us because the output of sort will be flattened by replacing the end of every line with a space.

To split the list into a sequence of lines you can construct a single echo statement with all the spaces in the list replaced by the newline escape sequence.

The function my-sort below performs a sort using the shell sort program:
sp := sp += # add space my-sort = $(shell echo -e $(subst $(sp),'\n',$2) | sort $1 --key=1,1 -)
my-sort has two arguments: the first are any arguments to pass to the shell's sort program; the second is the list to be sorted.

my-sort works by splitting the list by converting each space to a newline. This is done using the GNU Make $(subst) function that will convert spaces (from the $(sp) variable; since it's hard to pass a space as a function argument in GNU Make, the best way is to create a variable containing just a space and use it instead of a space literal) to newline (note that the newline character is single-quoted and that echo has the -e option: the quotes are there because GNU Make will actually escape the backslash with another backslash and the -e means that the newline escape character will be expanded to a literal newline).

Naturally, there are some problems with this: using the shell is probably slower than using a built-in function because of the need to spawn the shell, and you could hit a command-line length limit if the list to be sorted was long.

Since my-sort uses the shell to sort the list there are a number of options that can come in handy (and are not available by the built-in $(sort)):

-n Perform a numeric rather than alphabetic sort
-M Perform a 'month' sort
-r Reverse the sort order
-f Ignore case

So, for example, you can sort the list 1 12 4 6 8 3 4 5 9 10 alphabetically like this (here I use GNU Make's $(warning) function to print out the result of calling my-sort):
NUMBERS := 1 12 4 6 8 3 4 5 9 10 $(warning $(call my-sort,,$(NUMBERS))
which will output (the Makefile:2: part was generated by $(warning); the rest is the actual sorted list):
Makefile:2: 1 10 12 3 4 4 5 6 8 9
Or you could sort numerically with the -n option:
NUMBERS := 1 12 4 6 8 3 4 5 9 10 $(warning $(call my-sort,-n,$(NUMBERS))
which will output
Makefile:2: 1 3 4 4 5 6 8 9 10 12
Or even numerically, but in reverse order:
NUMBERS := 1 12 4 6 8 3 4 5 9 10 $(warning $(call my-sort,-n -r,$(NUMBERS))
which will output:
Makefile:2: 12 10 9 8 6 5 4 4 3 1

Dealing with duplicates

Since $(sort) always removes duplicates it's unsuitable for use if you need to sort a list but preserve duplicated entries. However, my-sort above works just fine for that purpose. If no options are specified in the first argument of my-sort then duplicates are not removed.

For example, you can sort the list a b a a c with my-sort like this:
DUPS := a b a a c $(warning $(call my-sort,,$(DUPS)))
and the output is:
Makefile:2: a a a b c
In fact, the only way my-sort will remove duplicates is if the -u option is specified:
DUPS := a b a a c $(warning $(call my-sort,-u,$(DUPS)))
and the output is:
Makefile:2: a b c
Another way to handle list deduplication is with the GNU Make Standard Library. It includes a function called uniq which removes duplicates from a list without sorting it. The first occurrence of each duplicated element is preserved.

For example, with the GNU Make Standard Library included (see http://gmsl.sf.net/ for more details) using include gmsl, the list b a c a a d can be deduplicated while preserving order:
include gmsl DUPS := b a c a a d $(warning $(call uniq,$(DUPS)))
which will output:
Makefile:4: b a c d

Searching for a file

Enough sorting, let's look at searching. First, searching for a file. As with sorting there are two options: use a built-in function and spawn a shell and use a shell program.

The built-in file searching function is called $(wildcard). Its argument is a pattern to search for on the file system and it returns a list of files and directories that match the pattern. It uses the same wild card patterns as the Bourne shell and so it's possible to use *, ? and ranges and character classes.

For example, to return a list of all files ending .c in the current directory you would write $(wildcard *.c). To search all subdirectories of the current directory for .c files you can write $(wildcard */*.c) and so on.

$(wildcard) can also be used to find directories. If the pattern ends with / then the function only returns directories that match the pattern, and each directory will be suffixed with /. For example, to get a list of all the directories in the current directory do $(wildcard */).

There are a few of $(wildcard) gotchas though:

1. $(wildcard) does not provide any way to recursively search a tree of directories.

2. $(wildcard) interacts in a surprising way with GNU Make's built-in directory cache mechanism which can mean that $(wildcard) doesn't reflect the actual state of the file system (for more on this see my article The Trouble with $(wildcard): http://www.cmcrossroads.com/content/view/6487/120/).

3. $(wildcard) doesn't work correctly if the pattern has a space in it. That's because all GNU Make functions assume lists split on spaces and have no special escaping or handling of spaces in paths. The work around for this situation is to replace all spaces in the pattern with the ? globbing character and then call $(wildcard):
sp := sp += # add space space-wildcard = $(wildcard $(subst $(sp),?,$1))
space-wildcard takes a single argument which is exactly the same format as $(wildcard) and replaces any spaces with ? before calling the native $(wildcard). Because of this you cannot call space-wildcard with a list of patterns to search.

Another way to search is to use the shell's find command (in a similar manner to the use of sort above). Since find has many more search options than $(wildcard) it's possible to perform much more flexible searches (including recursive descents).

Here's the definition of my-wildcard:
my-wildcard = $(shell find $1 -name '$2' $3 -print;)
It takes three arguments: the first is the directory to start searching in, the second is the pattern to search for, and the last optional argument are any argument to pass directly to find (for example, if you don't want find to recursive the third argument would be -maxdepth 0).

A simple example, shows how my-wildcard can be used to recursively search from the current directory for all .c files.
$(warning $(call my-wildcard,.,*.c))
If you don't need all the power of find then a simpler replacement function for $(wildcard) can be created just using echo:
simple-wildcard = $(shell echo $1)
It only takes one argument: the pattern to search for, but it overcomes the directory cache limit mentioned above and can be used as a direct replacement for $(wildcard):

Here's how simple-wildcard can be used to find all the .c files in the parent directory with exactly the same syntax as $(wildcard):
$(call simple-wildcard,../*.c)

Recursively searching for a file

Although you can use my-wildcard above (which used the shell find function to perform recursive searches) it would be nice to perform them using only built-in GNU Make functions to avoid the cost of starting a shell with $(shell).

It's possible to perform a recursive $(wildcard) using the following definition that only uses built-in GNU Make functions and never calls the shell:
search = $(foreach d,$(wildcard $1/*),$(call search,$d)$(filter $(subst *,%,$2),$d))
This search function has two arguments: the first is the directory to start the recursive descent in; the second is the pattern to search for. The pattern is not as flexible as $(wildcard) and only supports a single * wild card.

It works by first calling $(wildcard) to get a list of all files and directories (the $(wildcard $1/*)) and then for each one it recurses and calls itself. After the recursion returns to the main function the list of files and directories is matched against the pattern using $(filter).

It's here that the 'single * wild card' restriction is created because $(filter) can only accept a single % wild card. Any * present in the second argument to search is converted to % for $(filter) by the $(subst *,%,$2).

Searching a list of directories for a file

Another common type of file searching is 'in which of these directories can I find file X?' And the most common form of that is searching the path for an executable.

It's fairly easy to define a function, let's call it which-dir, that takes two arguments: a list of directories to search and the name of a file to look for. which-dir will search the list of directories and return the full path each time it finds the requested file. Each found file (and path) will be returned in the same order as the paths in the first argument.
which-dirs = $(wildcard $(addsuffix /$2,$1))
It works by adding the name of the file to search for to each element of the path using $(addsuffix). Since $(wildcard) works even if there are no wild card characters present (in which case it acts as a 'file exists' function, returning the name of the file if it is present, or an empty string if not) and $(wildcard) can accept a list of files to look for in which case it searches for each.

A simple use of which-dirs would be to search the PATH for a particular executable. Here's what happens on my machine looking for emacs on the path:
$(warning $(call which-dirs,$(subst :, ,$(PATH)),emacs))
and it outputs:
Makefile:2: /usr/bin/emacs /usr/bin/X11/emacs.
Notice that since PATH is a :-separated list I first turned it into a space-separated list for which-dirs by writing $(subst :, ,$(PATH)).

If I wanted to know the first place that emacs is found on the path I'd just take the first element of the list returned by which-dirs using the GNU Make built-in function $(firstword):
$(warning $(firstword $(call which-dirs,$(subst :, ,$(PATH)),emacs)))

Searching for a list member

That's enough file searching. How about searching to see if a list contains a specific member? GNU Make provides two functions that can be used for that purpose: $(filter) and $(filter-out).

To see if a list contains a specific string the $(filter) function is used. If the string is present in the list then $(filter) returns the string found, if it is not present the the function returns an empty string.

For example, to search the list a b dog c for the word dog and the word cat you can do:
LIST := a b dog c $(warning $(filter dog,$(LIST))) $(warning $(filter cat,$(LIST)))
and the output will look like:
Makefile:3: dog Makefile:4:
because there is a dog, but no cat.

$(filter) also supports wild cards; well one wild card: %. So to find whether the same list contains any word starting with d you use the pattern d%:
LIST := a b dog c $(warning $(filter d%,$(LIST)))
and you'll get the following output:
Makefile:3: dog
$(filter-out) performs the opposite of $(filter): it will return an empty string if the pattern is present, and a non-empty string if it is not. For example, to see if the same list does not contain the word cat you use $(filter-out):
LIST := a b dog c $(warning $(filter-out cat,$(LIST)))
and you'll get the following output:
Makefile:3: a b dog c
Notice that the output here is actually the entire list (because of the way $(filter-out) works). From GNU Make's perspective any non-empty string is 'true' and the empty string is 'false'. The output of both $(filter) and $(filter-out) can be used directly with ifndef or $(if) as a truth-value.

Searching for a sub-string

Lastly, if you are dealing with a string and not a list, you may need to search for a substring. GNU Make has a built-in function $(findstring) for that purpose.

For example, to see if the substring 'og c' appears in the string 'a b dog c' do the following:
STRING := a b dog c $(warning $(findstring og c,$(STRING)))
which will output
Makefile:3: og c
If the string had not been present the function would have returned an empty string. Notice how $(findstring) is treating the same text as above as a single string, and not a space-separated list.

Conclusion

That's enough sorting and searching for this month. I'll be back in October with more ways to make GNU Make do the things that aren't mentioned in the manual.

Debugging 101

10:35 AM Articles No comments

An interactive debugger is an outstanding example of what is not needed - it encourages trial-and-error hacking rather than systematic design, and also hides marginal people barely qualified for precision programming.

Harlan Mills

Recently, a colleague and I were working together to resolve a bug in a piece of code she had just written. The bug resulted in an exception being thrown and looking at the stack trace, we were both puzzled about what the root cause might be. Worse yet, the exception originated from within an open source library we were using. As is typical of open source products, the documentation was sparse, and wasn't providing us with very much help in diagnosing the problem before us. It was beginning to look like we might have to download the source code for this library and start going through it - a prospect that appealed to neither of us.

As a last resort before downloading this source code, I suggested that we try doing a web search on the text of the exception itself, by copying the last few lines of the stack trace into the search field for a web search engine. I hoped the search results might include pages from online forums where someone else had posted a message like "I'm seeing the following exception, can anyone tell me what it means?", followed by all or part of the stack trace itself. If the original poster had received a helpful response to their query, then perhaps that response would be helpful to us too.

My colleague, who is reasonably new to software development, was surprised by the idea and commented that it was something she would never have thought to try. Her response got me to thinking about debugging techniques in general, and the way we acquire our knowledge of them.

Reflecting on my formal education in computer science, I cannot recall a single tutorial or lecture that discussed how I should go about debugging the code that I wrote. Mind you, I cannot remember much of anything about those lectures, so perhaps it really was addressed and I've simply forgotten. Even so, it seems that the topic of debugging is much neglected in both academic and trade discussions. Why is this?

It seems particularly strange when you consider what portion of their time the average programmer spends debugging their own code. I've not measured it for myself, but I wouldn't be surprised if one third or more of my day was spent trying to figure out why my code doesn't behave the way I expected. It seems strange that I never learnt in any structured way how to debug a program. Everything I know about debugging has been acquired through experience, trial and error, and from watching others. Unless my experience is unique, it seems that debugging techniques should be a topic of vital interest to every developer. Yet some developers seem almost embarrassed to discuss it.

I suspect the main reason for this is hubris. The ostensible ability to write bug-free code is a point of pride for many programmers. Displaying a knowledge of debugging techniques is tantamount to admitting imperfection, acknowledging weakness, and that really sticks in the craw of those developers who like to think of themselves as "l337 h4x0r5". But by avoiding the topic, we lose a major opportunity to learn methods for combating our inevitable human weaknesses, and thereby improving the quality of the work we do.

So I've taken it upon myself to list the main debugging techniques that I am aware of. For many programmers, these techniques will be old hat and quite basic. But even for veteran debuggers there may be value in bringing back to mind some of these tried and true techniques. For others, there might be one or two methods that you hadn't thought of before. I hope they save you a few hours of frustrating fault-finding.

General Principles

Regardless of the specific debugging techniques you use, there are a few general principles and guidelines to keep in mind as your debugging effort proceeds.

Reproduce

The first task in any debugging effort is to learn how to consistently reproduce the bug. If it takes more than a few steps to manually trigger the buggy behavior, consider writing a small driver program to trigger it programmatically. Your debugging effort will proceed much more quickly as a result.

Progressively Narrow Scope

There are two basic ways to find the origin of a bug - brute force and analysis. Analysis is the thoughtful consideration of a bug's likely point of origin, based on detailed knowledge of the code base. A brute force approach is a largely mechanical search along the execution path until the fault is eventually found.

In practice, you will probably use a combination of both methods. A preliminary analysis will tell you the area of the code most likely to contain the bug, then a brute force search within that area will locate it precisely.

Purists may consider any application of the brute force approach to be tantamount to hacking. It may be so, but it is also the most expedient method in many circumstances. The quickest way to search the path of execution by brute force is to use a binary search, which progressively divides the search space in half at each iteration.

Avoid Debuggers

In general, I recommend you avoid symbolic debuggers of the type that have become standard in many IDEs. Debuggers tend to produce a very fragile debugging process. How often does it happen that you spend an extended period of time carefully stepping through a piece of code, statement by statement, only to find at the critical moment that you accidentally "step over" rather than "step into" some method call, and miss the point where a significant change in program state occurs? In contrast, when you progressively add trace statements to the code, you are building up a picture of the code in execution that cannot be suddenly lost or corrupted. This repeatability is highly valuable - you're monotonically progressing towards your goal.

I've noticed that habitual use of symbolic debuggers also tends to discourage serious reflection on the problem. It becomes a knee-jerk response to fire up the debugger the instant a bug is encountered and start stepping through code, waiting for the debugger to reveal where the fault is.

That said, there are a small number of situations where a debugger may be the best, or perhaps only, method available to you. If the fault is occurring inside compiled code that you don't have the source code for, then stepping through the just-in-time decompiled version of the executable may be the only way of subjecting the faulty code to scrutiny. Another instance where a debugger can be useful is in the case of memory overwrites and corruption, as can occur when using languages that permit direct memory manipulation, such as C and C++. The ability most debuggers provide to "watch" particular memory segments for changes can be helpful in highlighting unintentional memory modifications.

Change Only One Thing at a Time

Debugging is an iterative process whereby you make a change to the code, test to see if you've fixed the bug, make another change, test again, and so on until the bug is fixed. Each time you change the code, it's important to change only one aspect of it at a time That way, when the bug is eventually fixed, you will know exactly what caused it - namely, the very last thing you touched. If you try changing several things at once, you risk including unnecessary changes in your bug fix (which may themselves cause bugs in future), and diluting your understanding of the bug's origin.

Technical Methods

Debugging is a manually intensive activity more like solving logic problems or brain teasers than programming. You will find little use for elaborate tools, instead relying on a handful of simple techniques intelligently applied.

Insert Trace Statements

This is the principle debugging method I use. A trace statement is a human readable console or log message that is inserted into a piece of code suspected of containing a bug, then generally removed once the bug has been found. Trace statements not only trace the path of execution through code, but the changing state of program variables as execution progresses. If you have used Design By Contract (see below) diligently, you will already know what portion of the code to instrument with trace statements. Often it takes only half a dozen or so well chosen trace statements to pinpoint the cause of your bug. Once you have found the bug, you may find it helpful to leave a few of the trace statements in the code, perhaps converting console messages into file-based logging messages, to assist in future debugging efforts in that part of the code.

Consult the Log Files of Third Party Products

If you're using a third party application server, servlet engine, database engine or other active component then you'll find a whole heap of useful information about recently experienced errors in that component's own log files. You may have to configure the component to log the sort of information you're interested in. In general, if your bug seems to involve the internals of some third party product that you don't have the source code for (and so can't instrument with trace statements), see if the vendor has supplied some way to provide you with a window into the product's internal operation. For example, an ORM library might produce no console output at all by default, but provide a command line switch or configuration file property that makes it output all SQL statements that it issues to the database.

Search the Web for the Stack Trace

Cut the text from the end of a stack trace and use it as a search string in the web search engine of your choice. Hopefully this will pick up questions posted to discussion forums, where the poster has included the stack trace that they are seeing. If someone posted a useful response, then it might relate to your bug. You might also search on the text of an error message, or on an error number. Given that search engines might not discover dynamically generated web pages in discussion forums, you might also find it profitable to identify web sites likely to host discussions pertaining to your bug, and use the site's own search facilities in the manner just described.

Introduce Design by Contract

In my opinion, DBC is one of the best tools available to assist you in writing quality code. I have found rigorous use of it to be invaluable in tracking down bugs. If you're not familiar with DBC, think of it as littering your code with assertions about what the state of the program should be at that point, if everything is going as you expect it to. These assertions are checked programmatically, and an exception thrown when they fail. DBC tends to make the point of program failure very close to the point of logical error in your code. This avoids those frustrating searches where a program fails in function C, but the actual error was further up the call chain in function A, which passed on faulty values to function B, which in turn passed the values to function C, which ultimately failed. It's best to use DBC as a means of bug prevention, but you can also use it as a means of preventing bug recurrence. Whenever you find a bug, litter the surrounding code with assertions, so that if that code should ever go wrong again, a nearby assertion will fail.

Wipe the Slate Clean

Sometimes, after you've been hunting a bug for long enough, you begin to despair of ever finding it. There may be an overwhelming number of possible sources yet to explore, or the behavior you're observing is just plain bizarre. On such occasions it can be useful to wipe the slate clean and start again. Create an entirely new mini-application whose sole function is to demonstrate the presence of your bug. If you can write such a demo program, then you're well on your way to tracking down the cause of the bug. Now that you have the bug isolated in your demo program, start removing potentially faulty components one by one. For example, if your demo program uses some database connection pooling library, cut it out and run the program again. If the bug persists, then you've just identified one component that doesn't contribute to the buggy behavior. Proceed in that manner, stripping out as many possible fault sources as you can, one at a time. When you remove a component that makes the bug disappear, then you know that the problem is related to the last component you removed.

Intermittent Bugs

A bug that occurs intermittently and can't be consistently reproduced is the programmer's bane. They are often the result of asynchronous competition for shared resources, as might occur when multiple threads vie for shared memory or race for access to a local variable. They can also result from other applications competing for memory and I/O resources on the one machine.

First, try modifying your code so as to serialize any operations occurring in parallel. For example, don't spawn N threads to handle N calculations, but perform all N calculations in sequence. If your bug disappears, then you've got a synchronization problem between the blocks of code performing the calculations. For help in correctly synchronizing your threads, look first to any support for threading that is included in your programming language. Failing that, look for a third party library that supports development of multi-threaded code.

If your programming language doesn't provide guaranteed initialization of variables, then uninitialized variables can also be a source of intermittent bugs. 99% of the time, the variable gets initialized to zero or null and behaves as you expected, but the other 1% of the time it is initialized to some random value and fails. A class of tools called "System Perturbers" can assist you in tracking down such problems. Such tools typically include facility for zero-filling memory locations, or filling memory with random data as a way of teasing out initialization bugs.

Exploit Locality

Research shows that bugs tend to cluster together. So when you encounter a new bug, think of those parts of the code in which you have previously found bugs, and whether nearby code could be involved with the present bug.

Read the Documentation

If all else fails, read the instructions. It's remarkable how often this simple step is foregone. In their rush to start programming with some class library or utility some developers will adopt a trial-and-error approach to using a new API. If there is little or no API documentation then this may be an appropriate approach. But if the API has some decent programmer-level documentation with it, then take the time to read it. It's possible that your bug results from misuse of the API and the underlying code is failing to check that you have obeyed all the necessary preconditions for its use.

Introduce Dummy Implementations and Subclasses

Software designers are sometimes advised to "write to interfaces". In other words, rather than calling a method on a class directly, call a method on an interface that the class implements. This means that you are free to substitute in a different class that implements the same interface, without needing to change the calling code. While dogmatic application of this guideline can result in a proliferation of interfaces that are only implemented once, it does point to a useful debugging technique. If the outcome of the collaboration between several objects is buggy, look to the interfaces that the participating objects implement. Where an object is invoked only via interfaces, consider replacing the object with a simple, custom object of your own that is hard-wired to perform correctly under very specific circumstances. As long as you limit your testing to the circumstances that you know your custom object handles correctly, you know that any buggy behavior you subsequently observe must be the fault of one of the other objects involved. That is, you've eliminated one potential source of the bug. You can achieve a similar effect by substituting a custom subclass of a participant class, rather than a custom implementation of an interface.

Recompile / Relink

A particularly nasty type of bug arises from having an executable image that is a composite of several different compile and/or relink operations. The failure behavior can be quite bizarre and it can appear that internal program state is being corrupted "between statements". It's like gremlins have crept into your code and started screwing around with memory.

Most recently, I have encountered this bug in Java code when I change the value of string constants. It seems the compiler optimizes references to string constants by inserting them literally at the point of reference. So the constant value is copied to multiple class files. If you don't regenerate all those class files after changing the string constant, those class files not regenerated will still contain the old value of that constant. Performing a complete recompilation prevents this from occurring. Finally, set the compiler to include debugging information in the generated code, and set the compiler warning level to the maximum.

Probe Boundary Conditions and Special Cases

Experienced programmers know that it's the limits of an algorithmic space that tend to get forgotten or mishandled, thereby leading to bugs. For example, the procedure for deleting records 1 to N might be slightly different from the procedure for deleting record 0. The algorithm for determining if a given year is a leap year is slightly different if the year is divisible by 400. Breaking a string into a list of space-separated words requires consideration of the cases where the string contains only one word, or is empty. The tendency to code only the general case and forget the special cases is a very common source of error.

Check Version Dependencies

One of the most obscure sources of a bugs is the use of incompatible versions of third party libraries. It is also one of the last things to check when you've exhausted other debugging strategies. If version 1.0.2 of some library has a dependency on version 2.4 of another library, but you supply version 2.5 instead, the results may be subtle failures that are difficult or impossible to diagnose. Look particularly to any libraries that you have upgraded just prior to the appearance of the bug.

Check Code that Has Changed Recently

When a bug suddenly appears in functionality that has been working for some time, you should immediately wonder what has recently changed in the code base that might have caused this regression. This is where your version control system comes into its own, providing you with a way of looking at the change history of the code, or recreating successively older versions of the code base until you get one in which the regression disappears.

Don't Trust the Error Message

Normally you scrutinize the error messages you get very carefully, hoping for a clue as to where to start your debugging efforts. But if you're not having any luck with that approach, remember that error messages can sometimes be misleading. Sometimes programmers don't put as much thought into the handling and reporting of error conditions as one would like, so it may be wise to avoid interpreting the error message too literally, and to consider possibilities other than the ones it specifically identifies.

Graphics Bugs

There are a few techniques that are particularly relevant when working on GUI's or other graphics-related bugs. Check if the graphics pipeline you are using includes a debugging mode - a mode which slows down graphics operations to a speed where you can observe individual drawing operations occurring. This mode can be very useful for determining why a sequence of graphic operations don't combine to give the effect you expected.

When debugging problems with layout managers, I like to set the background colors of panels and components to solid, contrasting colors. This enables you to see exactly where the edges of the components are, which highlights the layout decisions made by the layout managers involved.

Psychological Methods

I think it's fair to say that the vast majority of bugs we encounter are a result of our own cognitive limitations. We might fail to fully comprehend the effects of a particular API call, forget to free memory we've reserved, or simply fail to translate our intent correctly into code. Indeed, one might consider debugging to be the process of finding the difference between what you instructed the machine to do, and what you thought you instructed the machine to do. So given their basis in faulty thinking, it makes sense to consider what mental techniques we can employ to think more effectively when hunting bugs.

Wooden Indian

When you're really stuck on a bug, it can be helpful to grab a colleague and explain the bug to them, together with the efforts you've made so far to hunt down its source. It may be that your colleague can offer some helpful advice, but this is not what the technique is really about. The role of your colleague is mainly just to listen to your description in a passive way. It sometimes happens that in the course of explaining the problem to another, you gain an insight into the bug that you didn't have before. This may be because explaining the bug's origin from scratch forces you to go back over mental territory that you haven't critically examined, and challenge fundamental assumptions that you have made. Also, by verbalizing you are engaging different sensory modalities which seems to make the problem "fresh" and revitalizes your examination of it.

Don't Speculate

Arthur C. Clarke once wrote "Any sufficiently advanced technology is indistinguishable from magic." And so it is for any sufficiently mysterious bug. One of the greatest traps you can fall into when debugging is to resort to superstitious speculation about its cause, rather than engaging in reasoned enquiry. Such speculation yields a trial-and-error debugging effort that might eventually be successful, but is likely to be highly inefficient and time consuming. If you find yourself making random tweaks without having some overall strategy or approach in mind, stop straight away and search for a more rational method.

Don't be too Quick to Blame the Tools

Perhaps you've had the embarrassing experience of announcing "it must be a compiler bug" before finding the bug in your own code. Once you've done it, you don't rush to judgement so quickly in the future. Part of rational debugging is realistically assessing the probability that there is a bug in one of the development tools you are using. If you are using a flaky development tool that is only up to its beta release, you would be quite justified in suspecting it of error. But if you're using a compiler that has been out for several years and has proven itself reliable in the field over all that time, then you should be very careful you've excluded every other possibility before concluding that the compiler is producing a faulty executable.

Understand Both Problem and Solution

It's not uncommon to hear programmers declare "That bug disappeared" or "It must've been fixed as a side-effect of some other work". Such statements indicate that the programmer isn't seeking a thorough understanding of the cause of a bug, or its solution, before dismissing it as no longer in need of consideration. Bugs don't just magically disappear. If a bug seems to be suddenly fixed without someone having deliberately attended to it, then there's a good chance that the fault is still somewhere in the code, but subsequent changes have changed the way it manifests. Never accept that a bug has disappeared or fixed itself. Similarly, if you find that some changes you've made appear to have fixed a bug, but you're not quite sure how, don't kid yourself that the fix is a genuine one. Again, you may simply have changed the character of the bug, rather than truly fixing its cause.

Take a Break

In both bug hunting and general problem solving I've experienced the following series of events more times than I can remember. After struggling with a problem for several hours and growing increasingly frustrated with it, I reach a point where I'm too tired to continue wrestling with it, so I go home. Several choice expletives are muttered. Doors are slammed. The next morning, I sit down to continue tackling the problem and the solution just falls out in the first half hour.

Many have noted that solutions come much easier after a period of intense concentration on the problem, followed by a period of rest. Whatever the underlying mechanism might be, if you have similar experiences its worth remembering them when you're faced with a decision between bashing your head against a problem for another hour, or having a rest from it.

Another way to get a fresh look at a piece of code you've been staring at for too long is to print it out and review it off the paper. We read faster off paper than off the screen, so this may be why it's slightly easier to spot an error in printed code than displayed code.

Consider Multiple Causes

There is a strong human tendency to oversimplify the diagnoses of problems, attributing what may be multi-causal problems to a single cause. The simplicity of such a diagnosis is appealing, and certainly easier to address. The habit is encouraged by the fact that many bugs really are the result of a single error, but that is by no means universally the case.

Bug Prevention Methods

"Prevention is better than cure," goes the maxim; as true of sicknesses in code as of sicknesses in the body. Given the inevitability and cost of debugging during your development effort, it's wise to prepare for it in advance and minimize it's eventual impact.

Monitor Your Own Fault Injection Habits

After time you may notice that you are prone to writing particular kinds of bugs. If you can identify a consistent weakness like this, then you can take preventative steps. If you have a code review checklist, augment the checklist to include a check specifically for the type of bug you favor. Simply maintaining an awareness of your "favorite" defects can help reduce your tendency to inject them.

Introduce Debugging Aids Early

Unless you've somehow attained perfection prior to starting work on your current project, you can be confident that you have numerous debugging efforts in store before you finish. You may as well make some preparation for them now. This means inserting logging statements as you proceed, so that you can selectively enable them later, before augmenting them with bug-specific trace statements. Also think about the prime places in your design to put interfaces. Often these will be at the perimeters of significant subsystems. For example, when implementing client-server applications, I like to hide all client contact with the server behind interfaces, so that a dummy implementation of the server can be used in place of the server, throughout client development. It's not only a convenient point of interception for debugging efforts, but a development expedient, as the test-debug cycle can be significantly faster without the time cost of real server deployment and communication.

Loose Coupling and Information Hiding

Application of these two principles is well known to increase the extensibility and maintainability of code, as well as easing its comprehension. Bear in mind that they also help to prevent bugs. An error in well modularized code is less likely to produce unintended side-effects in other modules, which obfuscates the origin of the bug and impedes the debugging effort.

Write a Regression Test to Prevent Reoccurrence

Once you've fixed the bug it's a good idea to write a regression test that exercises the previously buggy code and checks for correct operation. If you wish, you can write that regression test before having fixed the bug, so that a successful bug fix is indicated by the successful run of the test.

Conclusion

If you spend enough time debugging, you start to become a bit blasé about it. It's easy to slip into a rut and just keep following the same patterns of behavior you always have, which means that you never get any better or smarter at debugging. That's a definite disadvantage, given how much of the average programmer's working life is consumed by it. There's also a tendency not to examine debugging techniques closely or seriously, as debugging is something of a taboo topic in the programming community.

It's wise to acknowledge your own limitations up front, the resultant inevitability of debugging, and to make allowances for it right from the beginning of application development. It's also worth beginning each debugging effort with a few moments of deliberate reflection, to try and deduce the smartest and quickest way to find the bug.

Link Partner

Category

Followe Me

Thursday, August 23, 2007

By Sven-Thorsten Dietrich

By Binildas C.A.

Tuesday, August 21, 2007

Overview

The Problem To Be Solved

No Release, No Revenue

Software Tools Are Only Part of the Answer

Configuration Management Best Practices

It All Starts Here

Configuration Management is not Version Control

The Point of Labelling

Management Information

Build Best Practices

Building is not Compiling

Don’t Throw Out The Baby With The Bathwater

The Architecture Drives The Build<

Management Information

Deployment Best Practices

Deployment Is Not Installation

The Environment Is a Refactorable Component

Automate, Automate, Automate

About BuildMonkey

About the Author

General Principles

Reproduce

Progressively Narrow Scope

Avoid Debuggers

Change Only One Thing at a Time

Technical Methods

Insert Trace Statements

Consult the Log Files of Third Party Products

Search the Web for the Stack Trace

Introduce Design by Contract

Wipe the Slate Clean

Intermittent Bugs

Exploit Locality

Read the Documentation

Introduce Dummy Implementations and Subclasses

Recompile / Relink

Probe Boundary Conditions and Special Cases

Check Version Dependencies

Check Code that Has Changed Recently

Don't Trust the Error Message

Graphics Bugs

Psychological Methods

Wooden Indian

Don't Speculate

Don't be too Quick to Blame the Tools

Understand Both Problem and Solution

Take a Break

Consider Multiple Causes

Bug Prevention Methods

Monitor Your Own Fault Injection Habits

Introduce Debugging Aids Early

Loose Coupling and Information Hiding

Write a Regression Test to Prevent Reoccurrence

Conclusion

Social Profiles

Labels

Blog Archive