To 'C' or Not to 'C'

Is Recoding a Legacy Application from PL/I to C The Right Thing to Do?

By Richard Perkinson Director of Product Management Liant Software Corporation 959 Concord Street Framingham, MA 01701-4613 Phone: (508) 872-8700 Fax: (508) 626-2221 Version 2 email: openpl1@lpi.liant.com

Overview

The purpose of this paper is to discuss the issues involved in recoding PL/I applications to C. Moving legacy applications to open systems is of vital concern to many Information Systems organizations today. In fact, recent surveys of IBM mainframe sites by The Gartner Group and International Data Corporation (IDC) indicate that approximately 70% of these sites will be moving applications to open systems over the next few years.

There are three alternatives for moving an application from proprietary to open systems:

Re-engineering is ultimately the right option for certain applications. However, it is a long, expensive, and complicated endeavor. Recoding or recompiling are often looked to as quicker, less risky, and more cost-effective solutions.

It is the premise of this paper that even for systems that need to be re-engineered, recompiling is the right decision for the short term because it gives organizations the ability to gradually re-engineer their applications with less risk. This paper discusses why choosing recompilation over recoding, for those systems which will require little or no re-engineering, is the less risky, more cost-effective method. The paper goes into detail as to why recoding any major application from PL/I to C or C++ is seriously problematic and is no longer required.

PL/I, The Language

PL/I is a language designed by IBM to include the best features of FORTRAN and COBOL in a modern "structured programming" language. Conrad Weisert, of Information Disciplines Inc., said the following in a recent issue of ACM SIGPLAN Notices: "Imagine a general-purpose programming language that offers:

where "better" means some combination of easier to use, easier to learn, more complete, more reliable, and more fully integrated with the rest of the language. Wouldn't that language be worth looking into as a candidate vehicle for your next big software development effort? Well, that language exists, and it's not some vendor's proprietary "4th generation" wonder, but an established language supported by ANSI and international standards. It's PL/I." Perhaps some will regard this as a biased point of view. The fact is, however, that PL/I is a very powerful and a very well designed language. PL/I supports a wide spectrum of data types and storage classes, has extensive support for arrays and record structures, a full arsenal of logic control mechanisms, extensive exception handling, and multiple modes of I/O. The language prefers regularity of constructs and avoids quirky tricks. These are some of the reasons PL/I is often preferable over other languages.

The Viability of PL/I

If your application is written in PL/I, there are many benefits to continuing to use PL/I. The option of moving PL/I applications to open systems by recompiling is now a viable one due to the availability of PL/I development tools on many open system platforms. Both Liant Software Corporation and IBM Corporation support good PL/I tools on various open system platforms.
The other options available for migrating legacy PL/I applications to open systems environments are recoding the PL/I applications to another language or to re-engineer the application. These options are discussed in the following sections.

Recoding: The General Concerns

There are many things to consider when changing languages, particularly from PL/I to C. What are the costs? What are the risks? How long will it take? What is to be gained or lost? We will consider two levels of concerns for those who are thinking of recoding a PL/I application to C. There are particular language translation issues, and there are more general issues, that stand out from the language specifics.

Code Changes

First and foremost is the obvious fact that the more you have to change, the costlier it's going to be. When recompiling you may very likely encounter some language variations from one compiler to the next. This could result in changing a few lines in the legacy application to accommodate the new environment. The actual number of lines affected will be a function of the size of the application being compiled. Our experience has been that approximately 1% to 2% of the total numbers of lines of code need to be changed.

However, modifying 1% or 2% of your code to accommodate the variations in language implementations from compiler to compiler is much less drastic than changing every line of code, as you must when recoding. While it is true that translation tools have been developed to help in converting PL/I to C, none is reported to acceptably convert better than 70% of the code. This means that at least 30% of your code will require programmer attention.

Automatic Code Translators

The best PL/I to C automatic code translators proclaim up to a 70% successful conversion rate. Their cost, in the $8,000 to $10,000 range, is modest considering the effort they would reduce if they are successful.

My preliminary testing of converters revealed that they are not as satisfying as they might be. Data type conversion and the poor quality of the resulting code were the biggest problems encountered.

Data Type Conversion Problems

All fixed decimal values, regardless of length or precision, were assigned as a Double Float by the translator. By defining all fixed decimals as doubles all of them now occupied 8 bytes of storage rather than the 3, 5, 7, or 9 bytes the files they are in expect them to be. This means wholesale data structure conversion must take place before the translated program will run. The original record lengths just don't match the converted structures.

On top of this unexpected effort, the loss of precision leads to incorrect results in the runtime routines. Inspection of the runtime processes led to the discovery that any time an arithmetic operation is attempted with fields of different types, lengths, or precision your results are suspect.

Quality of Converted Code

Concern must also be given to the general quality of the C code generated by a conversion tool. In most cases, this code is not generated in a form that makes for easy reading on the part of programmers doing future maintenance. The program layout, the spacing, indentation, and so forth, tends to get rearranged. If the code is difficult to read and understand, it may hinder later efforts to modernize or re-engineer the recoded application.

Another quality concern of recoding is that because so much of the PL/I logic must be emulated by function calls in C, there is risk of performance deterioration. For example, a series of runtime routines are provided by the translator for a variety of data manipulations for each unsupported data type. Care must be taken to compensate for this.

Including the runtime tended to expand the original source listing by a factor between 9 and 17. In addition to performance degradation, the expanded code successfully obscured any sense of the business logic behind each algorithm.

If It Isn't Broken ...

An important fact to remember is that the application is already written in PL/I, and the application works. What new bugs will be introduced into this otherwise working code by just the translation effort itself? As you well know, each line of code that is changed has the potential of introducing a bug that requires time to track down and fix. Pieter Mimno, in the July 1994 issue of Application Development Trends, says:

"Tools are available to emulate host-based software such as CICS, VSAM, IMS and DB2 on lower end platforms, to allow legacy applications to run without making a change on the new host processor. The operating environment on the new host processor may be DOS, Windows, OS/2 or Unix. Porting of legacy code to low-cost servers is a proven low-risk, low-cost approach to reducing costs. There are few hidden costs because the ported code is not modified."

Note that Pieter states "because the ported code is not modified". This is an extremely important point. Translation at its essence is a complete modification of the application code. The potential of any change to inadvertently modify the functionality of an otherwise properly functioning program is high.

So, it's not just the time required to make the changes; there is also the time required to fix the problems they cause. The more lines that are changed, the more bugs that will result. In addition, a certain number of the bugs will surface only after the application goes back into service.

Staff Issues

Converting to C causes other concerns if you have a staff that is very strong in PL/I and not as strong in C. First, if your PL/I staff does the conversion, their C code may not have the quality of their PL/I code. Expect some problems here. Alternatively, if C specialists do the conversion, they may not have a thorough understanding of the PL/I code, and as a result problems may occur in the translation. If outside consultants are used, they will not be able to work completely on their own to rehost the application. Experienced staff members familiar with the application being recoded will need to be involved. This is true not only because the consultants will need advice about the functionality they are converting, but also because the existing staff will eventually be responsible for maintaining the converted code.

Second, and more important, is the period of productivity loss you will experience. It takes time for a programmer to become adept with a new language. C is not as large a language as PL/I, but it has many irregularities. Once the conversion is complete there will be a period of time when productivity will be somewhat reduced. It is estimated that this lapse in productivity will vary from six to eighteen months. In general, according to The Gartner Group in a recent 1994 survey, training an experienced programmer in a new language takes three projects. If the average project is four to six months, this translates to 12 to 18 months to yield a proficient worker.

According to The Gartner Group, "an internal programmer with multiple language skills", fluent in the new language and familiar with the application being converted, "can do the [recoding] job for around 50¢ per line of code. Outside consultants work at a premium, which doubles the cost to $1.00 per line of code."

Recoding: The Specific Concerns

PL/I is a particularly rich language, and many of its features cannot be translated simply into C. The conversion is possible, but you will have to expend a lot of effort. The following is a list of PL/I features and characteristics that will need special attention if you choose to recode your PL/I application in C.

Data Support

The following PL/I data types and classes have no direct equivalent in C:

In addition, data initialization in PL/I is quite flexible and some of it will require special attention. For example: The lack of sufficient data type expression to fully define the many PL/I supported data types is the most troublesome aspect of converting to C as discussed in the section entitled "Automatic Code Translators" earlier in this paper.

Data Operations

Some of the kinds of operations you can do in PL/I will require special attention:

Control Structures

The following aspects of program logic are different enough in PL/I that they will require special attention:

File I/O

The translation of PL/I's I/O will, of course, require special attention. In particular those needing attention are:

Built-in Functions

Some of PL/I's built-in functions will require special attention. For example:

Macro Preprocessor

Some of the features of the PL/I Macro Preprocessor will require special attention. For example,

Sample Recoding Consideration

A simple example of the care that must be taken occurs in a case like this:

Perhaps, over the years, someone found it necessary to increase the size of BusyString without noticing that it was somewhere assigned to NotOftenUsedString. It never mattered, though, because no one seemed to notice that two bytes were cut off the end of NotOftenUsedString in this case. However, when this assignment is converted to a strncpy call in C, two bytes of memory get overwritten. The bug may surface right away, but more likely it will take real users to hit it.

Re-Engineering

If your existing application can no longer be enhanced to meet the needs of the users or to take advantage of newer technologies, you most likely will eventually want to redesign it. An advantage of recompiling as your first transition step is that it gives you the time to carefully plan and design the bigger re-engineering effort. You save time on the initial transition of your existing logic and gain time for the re-engineering work.
It may well be that when you do eventually redesign and rewrite the application you will want to code it in C or C++. There are good reasons for this, particularly if you want to take advantage of the concepts of object-orientation. The C language has become synonymous with workstation environments. C++ has virtually become the standard choice for object-oriented programming. There are many software libraries that you might want to take advantage of, most of which are written in C, and the linkage of C and C++ to C is naturally easier that the linkage of PL/I to C.

However, when rewriting your application, you should not too hastily discard all your PL/I code. If the application is reasonably structured, as many PL/I applications are, it is likely that much of your underlying data processing code can easily be adapted into the new scheme. PL/I is generally better adapted for common data processing operations than C. It may be, that though the entire structure of your program has been changed to an object-oriented model with a new graphical user interface, the higher levels of the program can still call existing PL/I routines to do some of the low-level data processing. One big advantage of keeping some of this existing, proven code is that many subtle rules of your business operation are hidden in that code. Things have worked in a certain way for many years, and people have come to depend on that. Problems could arise if this code and these rules are changed.

Recompiling vs. Recoding Costs

It is true that C or C++, not PL/I, is the language of much new development on open system platforms. More new programmers are being trained in C and C++ rather than PL/I. However, the reality is also that there is a lot of existing PL/I that can be rehosted simply and safely with recompilation rather than translation.

The following example shows the difference in costs to recompile versus recode. As discussed earlier in the "Recoding: The General Concerns" section there may be some alteration of legacy code required due to language implementation differences when recompiling. These types of modifications top out at less than 2% of the total code and are usually done internally. Therefore, if you had an application of some 250,000 lines of legacy PL/I code (typical PL/I applications range from 250,000 to 6,000,000 lines of code) it would cost you at most $2,500 to recompile versus as much as $250,000 to recode!

Lines of Code Lines Changed Cost/ Line Total Cost Recompile 250,000 2% $.50 $2,500 Recode 250,000 100% $1.0 $250,000 Translator* 250,000 30% $1.00 $75,000 Translation Service 250,000 100% $.50 $125,000 * Does not include price of translator If you use some type of PL/I to C translator (remember the shortcomings expressed earlier), assume they can convert anywhere from 50-70% of the code to C successfully. If we take the most optimistic prediction of 70%, that would still leave 30% or 75,000 lines to do by hand, that's $75,000!

Keep in mind that some of that code will cost you $1.50 per line because both internal people and external consultants must work together. Also remember that there will most likely be a major data conversion effort if you use a translator and don't forget the cost of the translator itself. The Translation Service row illustrates an offer by one of the code converter companies to give you a clean compiled C program from PL/I source for $.50 a line. Keep in mind that "clean compile" doesn't mean the program actually works with your data. The economics of recompilation speaks for itself.

Summary

PL/I is a language that has many productive and useful years ahead of it. It is a language admirably suited to many problems found in the new open systems environments. Rehosting provides the opportunity to migrate legacy PL/I applications to this new world with controlled risk, controlled cost, protected investment, portability across platforms and the ability to modernize gradually.

Don't be in a hurry to abandon the old just because it is old. Don't rush to new technology simply because it is new. Don't forget the risk involved in meeting a deadline with new untried technology. For lack of a better name, call it the "Denver Airport Automated Baggage Syndrome". As the people in charge of the new airport at Denver found out, engineering a new solution to an old problem with new technology and with high pressure deadlines can be a recipe for disaster. Get the old system working in the new environment first, then make the switch to new technology where appropriate.