Notes: Code

Grace Hopper, COBOL

Commodore Grace M. Hopper, USN (covered).jpg

Grace Murray Hopper (December 9, 1906 – January 1, 1992) was an American computer scientist and United States Navy rear admiral.[1] A pioneer in the field, she was one of the first programmers of the Harvard Mark Icomputer, and developed the first compiler for a computer programming language.[2][3][4][5][6] She popularized the idea of machine-independent programming languages, which led to the development of COBOL, one of the firstmodern programming languages. She is credited with popularizing the term “debugging” for fixing computer glitches (inspired by an actual moth removed from the computer). Owing to the breadth of her accomplishments and her naval rank, she is sometimes referred to as “Amazing Grace”.[7][8] The U.S. Navy destroyer USS Hopper(DDG-70) is named for her, as was the Cray XE6 “Hopper” supercomputer at NERSC.

The A-0 system (Arithmetic Language version 0), written by Grace Hopper in 1951 and 1952 for the UNIVAC I, was the first compiler ever developed for an electronic computer.[1] The A-0 functioned more as a loader or linker than the modern notion of a compiler. A program was specified as a sequence of subroutines and arguments. The subroutines were identified by a numeric code and the arguments to the subroutines were written directly after each subroutine code. The A-0 system converted the specification into machine code that could be fed into the computer a second time to execute the said program.

John Backus FORTRAN

John Warner Backus (December 3, 1924 – March 17, 2007) was an American computer scientist. He directed the team that invented the first widely used high-level programming language (FORTRAN) and was the inventor of the Backus-Naur form (BNF), a widely used notation to define formal language syntax. He also did research in function-level programmingand helped to popularize it.

The IEEE awarded Backus the W.W. McDowell Award in 1967 for the development of FORTRAN.[1] He received theNational Medal of Science in 1975,[2] and the 1977 ACM Turing Award “for profound, influential, and lasting contributions to the design of practical high-level programming systems, notably through his work on FORTRAN, and for publication of formal procedures for the specification of programming languages.”[3]

Ken Thompson and Dennis Ritchie, developers of the C programming language.

Kenneth Thompson (born February 4, 1943), commonly referred to as ken in hacker circles,[1] is an American pioneer of computer science. Having worked at Bell Labs for most of his career, Thompson designed and implemented the originalUnix operating system. He also invented the B programming language, the direct predecessor to the C programming language, and was one of the creators and early developers of the Plan 9 operating systems. Since 2006, Thompson works at Google, where he co-invented the Go programming language.

Dennis MacAlistair Ritchie (September 9, 1941 – c. October 12, 2011)[1][2][3][4] was an American computer scientistwho “helped shape the digital era”.[1] He created the C programming language and, with long-time colleague Ken Thompson, the Unix operating system.[1] Ritchie and Thompson received the Turing Award from the ACM in 1983, theHamming Medal from the IEEE in 1990 and the National Medal of Technology from President Clinton in 1999. Ritchie was the head of Lucent Technologies System Software Research Department when he retired in 2007. He was the “R” in K&R C and commonly known by his username dmr.


Kristen Nygaard (August 27, 1926 – August 10, 2002) was a Norwegian computer scientistprogramming languagepioneer and politician. He was born in Oslo and died of a heart attack in 2002.  Internationally he is acknowledged as the co-inventor of object-oriented programming and the programming language Simula with Ole-Johan Dahl in the 1960s. Object-oriented programming enables software developers to manage the complexity of computer systems.[citation needed]

Ole-Johan Dahl (12 October 1931 – 29 June 2002)

was a Norwegian computer scientist and is considered to be one of the fathers of Simula and object-oriented programming along with Kristen Nygaard.  Dahl, born in MandalNorway, is widely accepted as Norway’s foremost computer scientist. With Kristen Nygaard, he produced the initial ideas for object-oriented (OO) programming in the 1960s at the Norwegian Computing Center (NR) as part of the Simula I (1961–1965) and Simula 67 (1965–1968) simulation programming languages. Dahl and Nygaard were the first to develop the concepts of classsubclass (allowing implicit information hiding), inheritancedynamic object creation, etc., all important aspects of the OO paradigm. An object is a self-contained component (with a data structure and associated procedures or methods) in a software system. These are combined to form a complete system. The object-oriented approach is now pervasive in modern software development, including widely used imperative programming languages such as Java and C++.

Bjarne Stroustrup, creator of C++

Bjarne Stroustrup (Danish: [ˈbjɑːnə ˈsdʁʌʊ̯ˀsdʁɔb];[1] born 30 December 1950) is a Danish computer scientist, most notable for the creation and development of the widely used C++ programming language.[2] He is a Distinguished Research Professor and holds the College of Engineering Chair in Computer Science at Texas A&M University,[3] a visiting professor at Columbia University, and works at Morgan Stanley.[4][5][6]

lambdas, name resolution, variable/type passing, ref/value. actually, sorry, gotta take a good look at c++. The Lisp line split, in the web context, was an important one imo.

2014.04.17 15:28CET i really feel somehow like i want to see these ppl whose names have been in the background of so much of my life.  gonna keep looking at c and c++ and try to complete this circle back to code i use today.  not at all done with these langs by any stretch, getting understanding and reference frame.

An IBM 704 mainframe

2014.04.17 21:59CET after hurdle of (tiny) understand of functional langs, compiler review from OO perspective with Algol focus.

FORTRAN, it turns out, is more than interesting cause my father used it.  It was an early innovator and i wanna know more detail about it.

FORTRAN is brilliant. not sure of favorite all round keywords but GO TO is beginning to rank and

PUNCH n, list

is pretty good too…

By the FORTRAN 66 Standard, Hollerith syntax was allowed in the following uses:

  • As constants in DATA statements
  • As constant actual arguments in subroutine CALL statements
  • As edit descriptors in FORMAT statements

2014.04.18 02:10CET well, it was a lovely day and progress as well.  I will be spending some more time with all of this history (some other time when i have time).  todo: data side

  • SQL (and its history #1), XML, XHMTL, XSLT, XPATH,XMPP, HTML, JSON, MicroData, native formats, etc…
  • Protocols
  • data structures, data forms, data normalization
  • entity/identity
  • ordered/unordered, tuple (length)
  • unique
  • n-ary.  identifying (triggers as CRUD relation or data type validation)
  • groups ,sets, (and onward)
  • data types (esp. notion of data integrity and native data type, as well as “struct” forms, data validation)
  • cross cutting concerns (many to many with missing middle or complex data relations with missing or broken pieces)
  • referential integrity (back to assocs) relation to code
  • procedural SQL (cursors, procedural T-SQL and PL-SQL, etc.)

if there r visitors.  please have a lovely eve.



Fortran acs cover.jpeg[5]

While the community was skeptical that this new method could possibly outperform hand-coding, it reduced the number of programming statements necessary to operate a machine by a factor of 20, and quickly gained acceptance. John Backus said during a 1979 interview withThink, the IBM employee magazine, “Much of my work has come from being lazy. I didn’t like writing programs, and so, when I was working on the IBM 701, writing programs for computing missile trajectories, I started work on a programming system to make it easier to write programs.”[6]

During the same Fortran Standards Committee meeting at which the name “FORTRAN 77” was chosen, a satirical technical proposal was incorporated into the official distribution bearing the title, “Letter O considered harmful“. This proposal purported to address the confusion that sometimes arises between the letter “O” and the numeral zero, by eliminating the letter from allowable variable names. However, the method proposed was to eliminate the letter from the character set entirely (thereby retaining 48 as the number of lexical characters, which the colon had increased to 49). This was considered beneficial in that it would promote structured programming, by making it impossible to use the notorious GO TO statement as before. (Troublesome FORMAT statements would also be eliminated.) It was noted that this “might invalidate some existing programs” but that most of these “probably were non-conforming, anyway”.[28][29]

During the standards committee battle over whether the “minimum trip count” for the FORTRAN 77 DO statement should be zero (allowing no execution of the block) or one (the “plunge-ahead” DO), another facetious alternative was proposed (by Loren Meissner) to have the minimum trip be two—since there is no need for a loop if it is only executed once.

The following is a FORTRAN 66 hello world program using Hollerith constants. It assumes that at least four characters per word are supported by the implementation:

Besides DATA statements, Hollerith constants were also allowed as actual arguments in subroutine calls. However there was no way that the callee could know how many characters were passed in. The programmer had to pass the information explicitly. The hello world program could be written as follows – on a machine where four characters are stored in a word:

Besides DATA statements, Hollerith constants were also allowed as actual arguments in subroutine calls. However there was no way that the callee could know how many characters were passed in. The programmer had to pass the information explicitly. The hello world program could be written as follows – on a machine where four characters are stored in a word:

Although technically not a Hollerith constant, the same Hollerith syntax was allowed as an edit descriptor in FORMAT statements. The hello world program could also be written as:

One of the most surprising features was the behaviour of Hollerith edit descriptors when used for input. The following program would change at run time HELLO WORLD to whatever would happen to be the next eleven characters in the input stream and print that input:

For various reasons Fortran 77 has these “logical” values and operators: .TRUE..FALSE..EQ..NE..LT..LE..GT..GE..EQV..NEQV..OR..AND.,.NOT. [1]

.AND..OR. and .XOR. are also used in combined tests in IF and IFF statements in batch files run under JP Software‘s command line processors like 4DOS4OS2,4NT and Take Command.



Timeline: Hello world

The variations and lack of portability of the programs from one implementation to another is easily demonstrated by the classic hello world program.


ALGOL 58 had no I/O facilities.

ALGOL 60 family

Since ALGOL 60 had no I/O facilities, there is no portable hello world program in ALGOL. The following program could (and still will) compile and run on an ALGOL implementation for a Unisys A-Series mainframe, and is a straightforward simplification of code taken from The Language Guide at the University of Michigan-Dearborn Computer and Information Science Department Hello world! ALGOL Example Program page.

A simpler program using an inline format:

An even simpler program using the Display statement:

An alternative example, using Elliott Algol I/O is as follows. Elliott Algol used different characters for “open-string-quote” and “close-string-quote”:

Here’s a version for the Elliott 803 Algol (A104) The standard Elliott 803 used 5 hole paper tape and thus only had upper case. The code lacked any quote characters so £ (UK Pound Sign) was used for open quote and ? (Question Mark) for close quote. Special sequences were placed in double quotes (e.g. ££L?? produced a new line on the teleprinter).

The ICT 1900 series Algol I/O version allowed input from paper tape or punched card. Paper tape ‘full’ mode allowed lower case. Output was to a line printer.

ALGOL 68 code was published with reserved words typically in lowercase, but bolded or underlined.

Timeline of ALGOL special characters

The ALGOLs were conceived at a time when character sets were diverse and evolving rapidly; also, the ALGOLs were defined so that only uppercase letters were required.

1960: IFIP – The Algol 60 language and report included several mathematical symbols which are available on modern computers and operating systems, but, unfortunately, were not supported on most computing systems at the time. For instance: ×, ÷, ≤, ≥, ≠, ¬, ∨, ∧, ⊂, ≡, ␣ and ⏨.

1961 September: ASCII – The ASCII character set, then in an early stage of development, had the  (Back slash) character added to it in order to support ALGOL’s booleanoperators / and /.[19]

1962: ALCOR – This character set included the unusual “᛭” (iron/runic cross) character and the “⏨” (Decimal Exponent Symbol) for floating point notation. [20][21][22]

1964: GOST – The 1964 Russian standard GOST 10859 allowed the encoding of 4-bit, 5-bit, 6-bit and 7-bit characters in ALGOL.[23]

1968: The “Algol 68 Report” – used existing ALGOL characters, and further adopted →, ↓, ↑, □, ⌊, ⌈, ⎩, ⎧, ○, ⊥ and ¢ characters which can be found on the IBM 2741keyboard with “golf-ball” print heads inserted (such as the APL golfball). These became available in the mid-1960s while ALGOL 68 was being drafted. The report was translated into Russian, German, French and Bulgarian, and allowed programming in languages with larger character sets, e.g. Cyrillic alphabet of the Russian BESM-4. All ALGOL’s characters are also part of the Unicode standard and most of them are available in several popular fonts.

2009 October: Unicode – The “⏨” (Decimal Exponent Symbol) for floating point notation was added to Unicode 5.2 for backward compatibility with historic Buran (spacecraft) ALGOL software.


Simula is a name for two simulation programming languages, Simula I and Simula 67, developed in the 1960s at the Norwegian Computing Center in Oslo, by Ole-Johan Dahl and Kristen Nygaard

Simula is considered the first object-oriented programming language. As its name implies, Simula was designed for doing simulations, and the needs of that domain provided the framework for many of the features of object-oriented languages today.

The influence of Simula is often understated, and Simula-type objects are reimplemented in C++Java and C#.

The creator of C++, Bjarne Stroustrup, has acknowledged that Simula 67 was the greatest influence on him to develop C++, to bring the kind of productivity enhancements offered by Simula to the raw computational speed offered by lower level languages like BCPL.


The following account is based on Jan Rune Holmevik’s historical essay.[2][3]

Kristen Nygaard started writing computer simulation programs in 1957. Nygaard saw a need for a better way to describe the heterogeneity and the operation of a system. To go further with his ideas on a formal computer language for describing a system, Nygaard realized that he needed someone with more computer programming skills than he had. Ole-Johan Dahl joined him on his work January 1962. The decision of linking the language up to ALGOL 60 was made shortly after. By May 1962 the main concepts for a simulation language were set. “SIMULA I” was born, a special purpose programming language for simulating discrete event systems.

Kristen Nygaard was invited to UNIVAC late May 1962 in connection with the marketing of their new UNIVAC 1107 computer. At that visit Nygaard presented the ideas of Simula to Robert Bemer, the director of systems programming at Univac. Bemer was a sworn ALGOL fan and found the Simula project compelling. Bemer was also chairing a session at the second international conference on information processing hosted by IFIP. He invited Nygaard, who presented the paper “SIMULA — An Extension of ALGOL to the Description of Discrete-Event Networks”.

In 1966 C. A. R. Hoare introduced the concept of record class construct, which Dahl and Nygaard extended with the concept of prefixing and other features to meet their requirements for a generalized process concept. Dahl and Nygaard presented their paper on Class and Subclass Declarations at the IFIP Working Conference on simulation languages in Oslo, May 1967.

This paper became the first formal definition of Simula 67.

In June 1967 a conference was held to standardize the language and initiate a number of implementations.

Dahl proposed to unify the type and the class concept. This led to serious discussions, and the proposal was rejected by the board. SIMULA 67 was formally standardized on the first meeting of the SIMULA Standards Group (SSG) in February 1968.

The empty computer file is the minimal program in Simula, measured by the size of the source code. It consists of one thing only; a dummy statement. However, the minimal program is more conveniently represented as an empty block:

It begins executing and immediately terminates. The language does not have any return value from the program itself.

Classic Hello world

An example of a Hello world program in Simula:

Simula is case-insensitive.

Classes, subclasses and virtual methods

A more realistic example with use of classes[1]:1.3.3, 2, subclasses[1]:2.2.1 and virtual methods[1]:2.2.3:

The above example has one super class (Glyph) with two subclasses (Char and Line). There is one virtual method with two implementations. The execution starts by executing the main program. Simula does not have the concept of abstract classes since classes with pure virtual methods can be instantiated. This means that in the above example all classes can be instantiated. Calling a pure virtual method will however produce a run-time error.

Call by name

Simula supports call by name[1]:8.2.3 so the Jensen’s Device can easily be implemented. However, the default transmission mode for simple parameter is call by value, contrary to ALGOL which used call by name. The source code for the Jensen’s Device must therefore specify call by name for the parameters when compiled by a Simula compiler.

Another much simpler example is the summation function  sum  which can be implemented as follows:

The above code uses call by name for the controlling variable (k) and the expression (u). This allows the controlling variable to be used in the expression.

Note that the Simula standard allows for certain restrictions on the controlling variable in a for loop. The above code therefore uses a while loop for maximum portability.

The following:

 Z = sum_{i=1}^{100}{1 over (i + a)^2}

can then be implemented as follows:


Simula includes a simulation[1]:14.2 package for doing discrete event simulations. This simulation package is based on Simula’s object oriented features and its coroutine[1]:9.2 concept.

Sam, Sally, and Andy are shopping for clothes. They have to share one fitting room. Each one of them is browsing the store for about 12 minutes and then uses the fitting room exclusively for about three minutes, each following a normal distribution. A simulation of their fitting room experience is as follows:



Smalltalk is an object-orienteddynamically typedreflective programming language. Smalltalk was created as the language to underpin the “new world” of computing exemplified by “human–computer symbiosis.”[1]

It was designed and created in part for educational use, more so for constructionist learning, at the Learning Research Group (LRG) of Xerox PARC by Alan KayDan IngallsAdele Goldberg, Ted Kaehler, Scott Wallace, and others during the 1970s.


There are a large number of Smalltalk variants.[3] The unqualified word Smalltalk is often used to indicate the Smalltalk-80 language, the first version to be made publicly available and created in 1980.

Smalltalk was the product of research led by Alan Kay at Xerox Palo Alto Research Center (PARC); Alan Kay designed most of the early Smalltalk versions, which Dan Ingalls implemented.

The first version, known as Smalltalk-71, was created by Ingalls in a few mornings on a bet that a programming language based on the idea of message passing inspired by Simula could be implemented in “a page of code.”[1] A later variant actually used for research work is now known as Smalltalk-72 and influenced the development of the Actor model.

Its syntax and execution model were very different from modern Smalltalk variants.

After significant revisions which froze some aspects of execution semantics to gain performance (by adopting a Simula-like class inheritance model of execution), Smalltalk-76 was created.

This system had a development environment featuring most of the now familiar tools, including a class library code browser/editor.

Smalltalk-80 added metaclasses, to help maintain the “everything is an object” (except private instance variables) paradigm by associating properties and behavior with individual classes, and even primitives such as integer and boolean values (for example, to support different ways of creating instances).This was supposed to be the first object-oriented language.

As an interesting link between generations, in 2002 Vassili Bykov implemented Hobbes, a virtual machine running Smalltalk-80 inside VisualWorks.[5] (Dan Ingalls later ported Hobbes to Squeak.)

During the late 1980s to mid-1990s, ParcPlace Systems and Digitalk, both California based.

  • ParcPlace Systems tended to focus on the Unix/Sun Microsystems market,
  • while Digitalk focused on Intel-based PCs running Microsoft Windows or IBM’s OS/2.

Both firms struggled to take Smalltalk mainstream due to Smalltalk’s substantial memory needs, limited run-time performance, and initial lack of supported connectivity to SQL-based relational database servers.


Smalltalk was the first true object-oriented programming language.[8]

Smalltalk was also one of the most influential programming languages. Virtually all of the object-oriented languages that came after: Flavors,[9] CLOS, Objective CC++C#Java,[10] PythonRuby[11] and many others, were all influenced by Smalltalk.

Smalltalk was also one of the most popular languages with the Agile MethodsRapid Prototyping, and Software Patterns[12] communities.

The highly productive environment provided by Smalltalk platforms made them ideal for rapid, iterative development.

Smalltalk emerged from a larger program of ARPA funded research that in many ways defined the modern world of computing. In addition to Smalltalk working prototypes of things such as hypertext, GUIs, multimedia, the mouse, telepresence, and the Internet were developed by ARPA researchers in the 1960s.[13][14] Alan Kay (one of the inventors of Smalltalk) also described a tablet computer he called the Dynabook which was essentially a design for an iPad.[15]

Smalltalk environments were often the first to develop what are now common object-oriented software design patterns.

One of the most popular is the Model–view–controller pattern for User Interface design. The MVC pattern enables developers to have multiple consistent views of the same underlying data. It’s ideal for software development environments, where there are various views (e.g., entity-relation, dataflow, object model, etc.) of the same underlying specification. Also, for simulations or games where the underlying model may be viewed from various angles and levels of abstraction.[16]

In addition to the MVC pattern the Smalltalk language and environment were tremendously influential in the history of the Graphical User Interface (GUI) and the What You See Is What You Get (WYSIWYG) user interface, font editors, and desktop metaphors for UI design.

The powerful built-in debugging and object inspection tools that came with Smalltalk environments set the standard for all the Integrated Development Environments, starting with Lisp Machine environments, that came after.[17]

Object-oriented programming

As in other object-oriented languages, the central concept in Smalltalk-80 (but not in Smalltalk-72) is that of an object. An object is always an instance of a class.

Classes are “blueprints” that describe the properties and behavior of their instances.

For example, a GUI’s window class might declare that windows have properties such as the label, the position and whether the window is visible or not. The class might also declare that instances support operations such as opening, closing, moving and hiding. Each particular window object would have its own values of those properties, and each of them would be able to perform operations defined by its class.

A Smalltalk object can do exactly three things:

  1. Hold state (references to other objects).
  2. Receive a message from itself or another object.
  3. In the course of processing a message, send messages to itself or another object.

Smalltalk is a “pure” object-oriented programming language, meaning that, unlike Java and C++, there is no difference between values which are objects and values which are primitive types.

In Smalltalk, primitive values such as integers, booleans and characters are also objects, in the sense that they are instances of corresponding classes, and operations on them are invoked by sending messages.

A programmer can change or extend (through subclassing) the classes that implement primitive values, so that new behavior can be defined for their instances—for example, to implement new control structures—or even so that their existing behavior will be changed.

This fact is summarized in the commonly heard phrase “In Smalltalk everything is an object”, which may be more accurately expressed as “all values are objects”, as variables are not.

Since all values are objects, classes themselves are also objects. Each class is an instance of the metaclass of that class. Metaclasses in turn are also objects, and are all instances of a class called Metaclass. Code blocks are also objects.[19]


Reflection is also a feature of having a meta-model as Smalltalk does. The meta-model is the model that describes the language itself and developers can use the meta-model to do things like walk through, examine, and modify the parse tree of an object. Or find all the instances of a certain kind of structure (e.g., all the instances of the Method class in the meta-model).

Smalltalk-80 is a totally reflective system, implemented in Smalltalk-80 itself.

Smalltalk-80 provides both structural and computational reflection.

Smalltalk is a structurally reflective system whose structure is defined by Smalltalk-80 objects.

The classes and methods that define the system are themselves objects and fully part of the system that they help define. The Smalltalk compiler compiles textual source code into method objects, typically instances of CompiledMethod. These get added to classes by storing them in a class’s method dictionary. The part of the class hierarchy that defines classes can add new classes to the system. The system is extended by running Smalltalk-80 code that creates or defines classes and methods. In this way a Smalltalk-80 system is a “living” system, carrying around the ability to extend itself at run time.

Smalltalk-80 also provides computational reflection, the ability to observe the computational state of the system. In languages derived from the original Smalltalk-80 the current activation of a method is accessible as an object named via a pseudo-variable (one of the six reserved words), thisContext. By sending messages to thisContext a method activation can ask questions like “who sent this message to me”. These facilities make it possible to implement co-routines or Prolog-like back-tracking without modifying the virtual machine.


An example of how Smalltalk can use reflection is the mechanism for handling errors. When an object is sent a message that it does not implement, the virtual machine sends the object the doesNotUnderstand: message with a reification of the message as an argument. The message (another object, an instance of Message) contains the selector of the message and an Array of its arguments. In an interactive Smalltalk system the default implementation of doesNotUnderstand: is one that opens an error window (a Notifier) reporting the error to the user. Through this and the reflective facilities the user can examine the context in which the error occurred, redefine the offending code, and continue, all within the system, using Smalltalk-80’s reflective facilities.[22][23]


Smalltalk-80 syntax is rather minimalist, based on only a handful of declarations and reserved words. In fact, only six “keywords” are reserved in Smalltalk: truefalse,nilselfsuper, and thisContext. These are actually called pseudo-variables, identifiers that follow the rules for variable identifiers but denote bindings that the programmer cannot change.

The truefalse, and nil pseudo-variables are singleton instances.

self and super refer to the receiver of a message within a method activated in response to that message, but sends to super are looked up in the superclass of the method’s defining class rather than the class of the receiver, which allows methods in subclasses to invoke methods of the same name in superclasses.

thisContext refers to the current activation record.

The only built-in language constructs are message sends, assignment, method return and literal syntax for some objects.


The following examples illustrate the most common objects which can be written as literal values in Smalltalk-80 methods.

Numbers. The following list illustrates some of the possibilities.

The last two entries are a binary and a hexadecimal number, respectively. The number before the ‘r’ is the radix or base. The base does not have to be a power of two; for example 36rSMALLTALK is a valid number equal to 80738163270632 decimal.

Characters are written by preceding them with a dollar sign:

Strings are sequences of characters enclosed in single quotes:

To include a quote in a string, escape it using a second quote:

Double quotes do not need escaping, since single quotes delimit a string:

Two equal strings (strings are equal if they contain all the same characters) can be different objects residing in different places in memory. In addition to strings, Smalltalk has a class of character sequence objects called Symbol. Symbols are guaranteed to be unique—there can be no two equal symbols which are different objects. Because of that, symbols are very cheap to compare and are often used for language artifacts such as message selectors (see below).

Symbols are written as # followed by a string literal. For example:

If the sequence does not include whitespace or punctuation characters, this can also be written as:


defines an array of four integers.

Many implementations support the following literal syntax for ByteArrays:

defines a ByteArray of four integers.

And last but not least, blocks (anonymous function literals)

Blocks are explained in detail further in the text.

Many Smalltalk dialects implement additional syntaxes for other objects, but the ones above are the essentials supported by all.

Variable declarations

The two kinds of variable commonly used in Smalltalk are instance variables and temporary variables. Other variables and related terminology depend on the particular implementation. For example, VisualWorks has class shared variables and namespace shared variables, while Squeak and many other implementations have class variables, pool variables and global variables.

Temporary variable declarations in Smalltalk are variables declared inside a method, at the top of the method as names separated by spaces and enclosed by vertical bars. For example:

declares a temporary variable named index. Multiple variables may be declared within one set of bars:

declares two variables: index and vowels.


A variable is assigned a value via the ‘:=’ syntax. So:

Assigns the string ‘aeiou’ to the previously declared vowels variable. The string is an object (a sequence of characters between single quotes is the syntax for literal strings), created by the compiler at compile time.

In the original Parc Place image, the glyph of the underscore character (_) appeared as a left-facing arrow (like in the 1963 version of the ASCII code). Smalltalk originally accepted this left-arrow as the only assignment operator. Some modern code still contains what appear to be underscores acting as assignments, hearkening back to this original usage. Most modern Smalltalk implementations accept either the underscore or the colon-equals syntax.


The message is the most fundamental language construct in Smalltalk. Even control structures are implemented as message sends. Smalltalk adopts by default a synchronous, single dynamic message dispatch strategy (as contrasted to the asynchronous, multiple dispatch strategy adopted by some other object-oriented languages).

The following example sends the message ‘factorial’ to number 42:

In this situation 42 is called the message receiver, while ‘factorial’ is the message selector. The receiver responds to the message by returning a value (presumably in this case the factorial of 42). Among other things, the result of the message can be assigned to a variable:

“factorial” above is what is called a unary message because only one object, the receiver, is involved. Messages can carry additional objects as arguments, as follows:

In this expression two objects are involved: 2 as the receiver and 4 as the message argument. The message result, or in Smalltalk parlance, the answer is supposed to be 16. Such messages are called keyword messages. A message can have more arguments, using the following syntax:

which answers the index of character ‘o’ in the receiver string, starting the search from index 6. The selector of this message is “indexOf:startingAt:”, consisting of two pieces, or keywords.

Such interleaving of keywords and arguments is meant to improve readability of code, since arguments are explained by their preceding keywords. For example, an expression to create a rectangle using a C++ or Java-like syntax might be written as:

It’s unclear which argument is which. By contrast, in Smalltalk, this code would be written as:

The receiver in this case is “Rectangle”, a class, and the answer will be a new instance of the class with the specified width and height.

Finally, most of the special (non-alphabetic) characters can be used as what are called binary messages. These allow mathematical and logical operators to be written in their traditional form:

which sends the message “+” to the receiver 3 with 4 passed as the argument (the answer of which will be 7). Similarly,

is the message “>” sent to 3 with argument 4 (the answer of which will be false).

Notice, that the Smalltalk-80 language itself does not imply the meaning of those operators. The outcome of the above is only defined by how the receiver of the message (in this case a Number instance) responds to messages “+” and “>”.

A side effect of this mechanism is operator overloading. A message “>” can also be understood by other objects, allowing the use of expressions of the form “a > b” to compare them.


An expression can include multiple message sends. In this case expressions are parsed according to a simple order of precedence. Unary messages have the highest precedence, followed by binary messages, followed by keyword messages. For example:

is evaluated as follows:

  1. 3 receives the message “factorial” and answers 6
  2. 4 receives the message “factorial” and answers 24
  3. 6 receives the message “+” with 24 as the argument and answers 30
  4. 30 receives the message “between:and:” with 10 and 100 as arguments and answers true

The answer of the last message sent is the result of the entire expression.

Parentheses can alter the order of evaluation when needed. For example,

will change the meaning so that the expression first computes “3 factorial + 4” yielding 10. That 10 then receives the second “factorial” message, yielding 3628800. 3628800 then receives “between:and:”, answering false.

Note that because the meaning of binary messages is not hardwired into Smalltalk-80 syntax, all of them are considered to have equal precedence and are evaluated simply from left to right. Because of this, the meaning of Smalltalk expressions using binary messages can be different from their “traditional” interpretation:

is evaluated as “(3 + 4) * 5”, producing 35. To obtain the expected answer of 23, parentheses must be used to explicitly define the order of operations:

Unary messages can be chained by writing them one after another:

which sends “factorial” to 3, then “factorial” to the result (6), then “log” to the result (720), producing the result 2.85733.

A series of expressions can be written as in the following (hypothetical) example, each separated by a period. This example first creates a new instance of class Window, stores it in a variable, and then sends two messages to it.

If a series of messages are sent to the same receiver as in the example above, they can also be written as a cascade with individual messages separated by semicolons:

This rewrite of the earlier example as a single expression avoids the need to store the new window in a temporary variable. According to the usual precedence rules, the unary message “new” is sent first, and then “label:” and “open” are sent to the answer of “new”.

Code blocks

A block of code (an anonymous function) can be expressed as a literal value (which is an object, since all values are objects.) This is achieved with square brackets:

Where :params is the list of parameters the code can take. This means that the Smalltalk code:

can be understood as:

f : f(x) = x + 1

or expressed in lambda terms as:

lambda x : x + 1


can be evaluated as

f(3) = 3 + 1

Or in lambda terms as:

(lambda x : x + 1) 3 _betarightarrow 4

The resulting block object can form a closure: it can access the variables of its enclosing lexical scopes at any time. Blocks are first-class objects.

Blocks can be executed by sending them the value message (compound variations exist in order to provide parameters to the block e.g. ‘value:value:’ and ‘valueWithArguments:’).

The literal representation of blocks was an innovation which on the one hand allowed certain code to be significantly more readable; it allowed algorithms involving iteration to be coded in a clear and concise way. Code that would typically be written with loops in some languages can be written concisely in Smalltalk using blocks, sometimes in a single line. But more importantly blocks allow control structure to be expressed using messages and polymorphism, since blocks defer computation and polymorphism can be used to select alternatives. So if-then-else in Smalltalk is written and implemented as

expr ifTrue: [statements to evaluate if expr] ifFalse: [statements to evaluate if not expr]

True methods for evaluation
ifTrue: trueAlternativeBlock ifFalse: falseAlternativeBlock
^trueAlternativeBlock value

False methods for evaluation
ifTrue: trueAlternativeBlock ifFalse: falseAlternativeBlock

^falseAlternativeBlock value

Note that this is related to functional programming, where in patterns of computation (here selection) are abstracted into higher-order functions. For example, the messageselect: on a Collection is equivalent to the higher-order function filter on an appropriate functor.[24]

Control structures

Control structures do not have special syntax in Smalltalk. They are instead implemented as messages sent to objects. For example, conditional execution is implemented by sending the message ifTrue: to a Boolean object, passing as an argument the block of code to be executed if and only if the Boolean receiver is true.

The following code demonstrates this:

Blocks are also used to implement user-defined control structures, enumerators, visitors, pluggable behavior and many other patterns. For example:

In the last line, the string is sent the message select: with an argument that is a code block literal. The code block literal will be used as a predicate function that should answer true if and only if an element of the String should be included in the Collection of characters that satisfy the test represented by the code block that is the argument to the “select:” message.

A String object responds to the “select:” message by iterating through its members (by sending itself the message “do:”), evaluating the selection block (“aBlock”) once with each character it contains as the argument. When evaluated (by being sent the message “value: each”), the selection block (referenced by the parameter “aBlock”, and defined by the block literal “[:aCharacter | aCharacter isVowel]”), answers a boolean, which is then sent “ifTrue:”. If the boolean is the object true, the character is added to a string to be returned. Because the “select:” method is defined in the abstract class Collection, it can also be used like this:


This is a stock class definition:[25]

Often, most of this definition will be filled in by the environment. Notice that this is actually a message to the “Object”-class to create a subclass called “MessagePublisher”. In other words: classes are first-class objects in Smalltalk which can receive messages just like any other object and can be created dynamically at execution time.


When an object receives a message, a method matching the message name is invoked. The following code defines a method publish, and so defines what will happen when this object receives the ‘publish’ message.

The following method demonstrates receiving multiple arguments and returning a value:

The method’s name is #quadMultiply:and:. The return value is specified with the ^ operator.

Note that objects are responsible for determining dynamically at runtime which method to execute in response to a message—while in many languages this may be (sometimes, or even always) determined statically at compile time.

Instantiating classes

The following code:

creates (and returns) a new instance of the MessagePublisher class. This is typically assigned to a variable:

However, it is also possible to send a message to a temporary, anonymous object:

Hello World example

The Hello world program is used by virtually all texts to new programming languages as the first program learned to show the most basic syntax and environment of the language. For Smalltalk, the program is extremely simple to write. The following code, the message “show:” is sent to the object “Transcript” with the String literal ‘Hello, world!’ as its argument. Invocation of the “show:” method causes the characters of its argument (the String literal ‘Hello, world!’) to be displayed in the transcript (“terminal”) window.

Note that a Transcript window would need to be open in order to see the results of this example.

Image-based persistence

Many Smalltalk systems do not differentiate between program data (objects) and code (classes). In fact, classes are objects themselves. Therefore most Smalltalk systems store the entire program state (including both Class and non-Class objects) in an image file. The image can then be loaded by the Smalltalk virtual machine to restore a Smalltalk-like system to a prior state.[26] This was inspired by FLEX, a language created by Alan Kay and described in his M.Sc. thesis.[27]

Smalltalk images are similar to (restartable) core dumps and can provide delayed or remote debugging with full access to the program state at the time of error; all the development information (e.g. parse trees of the program) is saved which facilitates debugging.

It also has serious drawbacks as a true persistence mechanism.

  • developers may often want to hide implementation details and not make them available in a run time environment
  • For legal reasons as well as for maintenance reasons, allowing anyone to modify the program at run time inevitably introduces complexity and potential errors that would not be possible with a compiled system that does not expose source code in the run time environment.
  • Lacks the true persistence capabilities needed for most multi-user systems.  e.g. ability to do transactions with multiple users accessing the same database in parallel.[28]

Level of access

Everything in Smalltalk-80 is available for modification from within a running program. This means that, for example, the IDE can be changed in a running system without restarting it. In some implementations, the syntax of the language or the garbage collection implementation can also be changed on the fly. Even the statement true become: false is valid in Smalltalk, although executing it is not recommended. When used judiciously, this level of flexibility allows for one of the shortest required times for new code to enter a production system.[citation needed]

Just-in-time compilation

Smalltalk programs are usually compiled to bytecode, which is then interpreted by a virtual machine or dynamically translated into machine-native code.

GNU Smalltalk:


C Character set

The basic C source character set includes the following characters:

Newline indicates the end of a text line; it need not correspond to an actual single character, although for convenience C treats it as one.

Additional multibyte encoded characters may be used, but are not portable. The latest C standard (C11) allows multinational Unicode characters to be embedded portably within C source text by using a uDDDD encoding (where DDDD denotes a Unicode character code), although this feature is not yet widely implemented.

The basic C execution character set contains the same characters, along with representations for alertbackspace, and carriage returnRun-time support for extended character sets has increased with each revision of the C standard.

C source code is free-form which allows arbitrary use of whitespace to format code, rather than column-based or text-line-based restrictions. Comments may appear either between the delimiters /* and */, or (since C99) following // until the end of the line. Comments delimited by /* and */ do not nest, and these sequences of characters are not interpreted as comment delimiters if they appear inside string or character literals.[22]

This is most notable in C, where identifiers that begin with an underscore are reserved, though the precise details of what identifiers are reserved at what scope are involved, and leading double underscores are reserved for any use;[8] similarly in C++ any identifier that contains a double underscore is reserved for any use, while an identifier that begins with an underscore is reserved in the global space.[a] Thus one can add a new keyword foo using the reserved word __foo. While this is superficially similar to stropping, the semantics are different. As a reserved word, the string __foo represents the identifier __foo in the common identifier namespace. In stropping (by prefixing keywords by __), the string __foo represents the keyword foo in a separate keyword namespace. Thus using reserved words, the tokens for__foo and foo are (identifier, __foo) and (identifier, foo) – different values in the same category – while in stropping the tokens for __foo and foo are (keyword, foo) and (identifier, foo) – same values in different categories. These solve the same problem of namespace clashes in a way that is the same for a programmer, but which differs in terms of formal grammar and implementation.


A syntactically similar but semantically different phenomenon are sigils, which instead indicate properties of variables. These are common in PerlRuby, and various other languages to identify characteristics of variables/constants: Perl to designate the type of variable, Ruby to distinguish variables from constants and to indicate scope. Note that this affects the semantics of the variable, not the syntax of whether it is an identifier or keyword.

The C language also exhibits the following characteristics:

  • There is a small, fixed number of keywords, including a full set of flow of control primitives: forif/elsewhileswitch, and do/while.
  • There is one namespace, and user-defined names are not distinguished from keywords by any kind of sigil.
  • There are a large number of arithmetical and logical operators, such as ++=++&~, etc.
  • More than one assignment may be performed in a single statement.
  • Function return values can be ignored when not needed.
  • Typing is static, but weakly enforced: all data has a type, but implicit conversions can be performed; for instance, characters can be used as integers.
  • Declaration syntax mimics usage context. C has no “define” keyword; instead, a statement beginning with the name of a type is taken as a declaration. There is no “function” keyword; instead, a function is indicated by the parentheses of an argument list.
  • User-defined (typedef) and compound types are possible.
    • Heterogeneous aggregate data types (struct) allow related data elements to be accessed and assigned as a unit.
    • Array indexing is a secondary notion, defined in terms of pointer arithmetic. Unlike structs, arrays are not first-class objects; they cannot be assigned or compared using single built-in operators. There is no “array” keyword, in use or definition; instead, square brackets indicate arrays syntactically, e.g. month[11].
    • Enumerated types are possible with the enum keyword. They are not tagged, and are freely interconvertible with integers.
    • Strings are not a separate data type, but are conventionally implemented as null-terminated arrays of characters.
  • Low-level access to computer memory is possible by converting machine addresses to typed pointers.
  • Procedures (subroutines not returning values) are a special case of function, with an untyped return type void.
  • Functions may not be defined within the lexical scope of other functions.
  • Function and data pointers permit ad hoc run-time polymorphism.
  • preprocessor performs macro definition, source code file inclusion, and conditional compilation.
  • There is a basic form of modularity: files can be compiled separately and linked together, with control over which functions and data objects are visible to other files viastatic and extern attributes.
  • Complex functionality such as I/Ostring manipulation, and mathematical functions are consistently delegated to library routines.

:= and = operators which were replaced with = for assignment and == for equality test. (The & and | of BCPL was later changed to && and || in the transition to what is now known as C.) [1]

  • standard I/O library
  • long int data type
  • unsigned int data type
  • compound assignment operators of the form =op (such as =-) were changed to the form op= to remove the semantic ambiguity created by such constructs as i=-10, which had been interpreted as i =- 10 (decrement i by 10) instead of the possibly intendedi = -10 (let i be -10)

In cases where code must be compilable by either standard-conforming or K&R C-based compilers, the __STDC__ macro can be used to split the code into Standard and K&R sections to prevent the use on a K&R C-based compiler of features available only in Standard C.

C99 introduced several new features, including inline functions, several new data types (including long long int and a complex type to represent complex numbers),variable-length arrays, improved support for IEEE 754 floating point, support for variadic macros (macros of variable arity), and support for one-line comments beginning with //, as in BCPL or C++. Many of these had already been implemented as extensions in several C compilers.

C99 is for the most part backward compatible with C90, but is stricter in some ways; in particular, a declaration that lacks a type specifier no longer has int implicitly assumed. A standard macro __STDC_VERSION__ is defined with value 199901L to indicate that C99 support is available.

type generic macros, anonymous structures, improved Unicode support, atomic operations, multi-threading, and bounds-checked functions. It also makes some portions of the existing C99 library optional, and improves compatibility with C++.

structunion, and enum, or assign types to and perhaps reserve storage for new variables, usually by writing the type followed by the variable name. Keywords such as char and int specify built-in types. Sections of code are enclosed in braces ({ and }, sometimes called “curly brackets”) to limit the scope of declarations and to act as a single statement for control structures

Sequence points also occur during evaluation of expressions containing certain operators (&&||?: and the comma operator).


C89 has 32 keywords (reserved words with special meaning):


C99 adds five more keywords:


C11 adds seven more keywords:[24]


Most of the recently added keywords begin with an underscore followed by a capital letter, because identifiers of that form were previously reserved by the C standard for use only by implementations. Since existing program source code should not have been using these identifiers, it would not be affected when C implementations started supporting these extensions to the programming language. Some standard headers do define more convenient synonyms for underscored identifiers. The language previously included a reserved keyword called entry, but this was never implemented, and has now been removed as a reserved word.[25]


C supports a rich set of operators, which are symbols used within an expression to specify the manipulations to be performed while evaluating that expression. C has operators for:

C uses the = operator, reserved in mathematics to express equality, to indicate assignment, following the precedent of Fortran and PL/I, but unlike ALGOL and its derivatives. The similarity between C’s operator for assignment and that for equality (==) has been criticised as it makes it easy to accidentally substitute one for the other. In many cases, each may be used in the context of the other without a compilation error (although some compilers produce warnings). For example, the conditional expression in if(a=b+1) is true if a is not zero after the assignment.[26] Additionally, C’s operator precedence is non-intuitive, such as == binding more tightly than & and| in expressions like x & 1 == 0, which would need to be written (x & 1) == 0 to be properly evaluated.[27]

The first line of the program contains a preprocessing directive, indicated by #include. This causes the compiler to replace that line with the entire text of the stdio.h standard header, which contains declarations for standard input and output functions such as printf. The angle brackets surrounding stdio.h indicate that stdio.his located using a search strategy that prefers headers in the compiler’s include path to other headers having the same name; double quotes are used to include local or project-specific header files.

The next line indicates that a function named main is being defined. The main function serves a special purpose in C programs; the run-time environment calls the mainfunction to begin program execution. The type specifier int indicates that the value that is returned to the invoker (in this case the run-time environment) as a result of evaluating the main function, is an integer. The keyword void as a parameter list indicates that this function takes no arguments.[b]

The opening curly brace indicates the beginning of the definition of the main function.

The next line calls (diverts execution to) a function named printf, which is supplied from a system library. In this call, the printf function is passed (provided with) a single argument, the address of the first character in the string literal "hello, worldn". The string literal is an unnamed array with elements of type char, set up automatically by the compiler with a final 0-valued character to mark the end of the array (printf needs to know this). The n is an escape sequence that C translates to a newline character, which on output signifies the end of the current line. The return value of the printf function is of type int, but it is silently discarded since it is not used. (A more careful program might test the return value to determine whether or not the printf function succeeded.) The semicolon ; terminates the statement.

The closing curly brace indicates the end of the code for the main function. According to the C99 specification and newer, main function will implicitly return a status of 0upon reaching the } that terminates the function. This is interpreted by the run-time system as an exit code indicating successful execution.[29]

Data types

C is often used in low-level systems programming where escapes from the type system may be necessary. The compiler attempts to ensure type correctness of most expressions, but the programmer can override the checks in various ways, either by using a type cast to explicitly convert a value from one type to another, or by using pointers or unions to reinterpret the underlying bits of a data object in some other way.

C has a static weak typing type system that shares some similarities with that of other ALGOL descendants such as Pascal.

There are built-in types for

There are also derived types including

Some find C’s declaration syntax unintuitive, particularly for function pointers. (Ritchie’s idea was to declare identifiers in contexts resembling their use: “declaration reflects use“.)[30]

C’s usual arithmetic conversions allow for efficient code to be generated, but can sometimes produce unexpected results. For example, a comparison of signed and unsigned integers of equal width requires a conversion of the signed value to unsigned. This can generate unexpected results if the signed value is negative.


C supports the use of pointers, a type of reference that records the address or location of an object or function in memory. Pointers can be dereferenced to access data stored at the address pointed to, or to invoke a pointed-to function. Pointers can be manipulated using assignment or pointer arithmetic. The run-time representation of a pointer value is typically a raw memory address (perhaps augmented by an offset-within-word field), but since a pointer’s type includes the type of the thing pointed to, expressions including pointers can be type-checked at compile time. Pointer arithmetic is automatically scaled by the size of the pointed-to data type. Pointers are used for many different purposes in C. Text strings are commonly manipulated using pointers into arrays of characters. Dynamic memory allocation is performed using pointers. Many data types, such as trees, are commonly implemented as dynamically allocated struct objects linked together using pointers. Pointers to functions are useful for passing functions as arguments to higher-order functions (such as qsort or bsearch) or as callbacks to be invoked by event handlers.[29]

null pointer value explicitly points to no valid location. Dereferencing a null pointer value is undefined, often resulting in a segmentation fault. Null pointer values are useful for indicating special cases such as no “next” pointer in the final node of a linked list, or as an error indication from functions returning pointers. In appropriate contexts in source code, such as for assigning to a pointer variable, a null pointer constant can be written as 0, with or without explicit casting to a pointer type, or as the NULL macro defined by several standard headers. In conditional contexts, null pointer values evaluate to false, while all other pointer values evaluate to true.

Void pointers (void *) point to objects of unspecified type, and can therefore be used as “generic” data pointers. Since the size and type of the pointed-to object is not known, void pointers cannot be dereferenced, nor is pointer arithmetic on them allowed, although they can easily be (and in many contexts implicitly are) converted to and from any other object pointer type.[29]

Careless use of pointers is potentially dangerous. Because they are typically unchecked, a pointer variable can be made to point to any arbitrary location, which can cause undesirable effects. Although properly used pointers point to safe places, they can be made to point to unsafe places by using invalid pointer arithmetic; the objects they point to may be deallocated and reused (dangling pointers); they may be used without having been initialized (wild pointers); or they may be directly assigned an unsafe value using a cast, union, or through another corrupt pointer. In general, C is permissive in allowing manipulation of and conversion between pointer types, although compilers typically provide options for various levels of checking. Some other programming languages address these problems by using more restrictive reference types.


Array types in C are traditionally of a fixed, static size specified at compile time. (The more recent C99 standard also allows a form of variable-length arrays.) However, it is also possible to allocate a block of memory (of arbitrary size) at run-time, using the standard library’s malloc function, and treat it as an array. C’s unification of arrays and pointers means that declared arrays and these dynamically allocated simulated arrays are virtually interchangeable.

Since arrays are always accessed (in effect) via pointers, array accesses are typically not checked against the underlying array size, although some compilers may provide bounds checking as an option.[31] Array bounds violations are therefore possible and rather common in carelessly written code, and can lead to various repercussions, including illegal memory accesses, corruption of data, buffer overruns, and run-time exceptions. If bounds checking is desired, it must be done manually.

C does not have a special provision for declaring multidimensional arrays, but rather relies on recursion within the type system to declare arrays of arrays, which effectively accomplishes the same thing. The index values of the resulting “multidimensional array” can be thought of as increasing in row-major order.

Multidimensional arrays are commonly used in numerical algorithms (mainly from applied linear algebra) to store matrices. The structure of the C array is well suited to this particular task. However, since arrays are passed merely as pointers, the bounds of the array must be known fixed values or else explicitly passed to any subroutine that requires them, and dynamically sized arrays of arrays cannot be accessed using double indexing. (A workaround for this is to allocate the array with an additional “row vector” of pointers to the columns.)

C99 introduced “variable-length arrays” which address some, but not all, of the issues with ordinary C arrays.

Array-pointer interchangeability

The subscript notation x[i] (where x designates a pointer) is a syntactic sugar for *(x+i).[32] Taking advantage of the compiler’s knowledge of the pointer type, the address that x + i points to is not the base address (pointed to by x) incremented by i bytes, but rather is defined to be the base address incremented by i multiplied by the size of an element that x points to. Thus, x[i] designates the i+1th element of the array.

Furthermore, in most expression contexts (a notable exception is as operand of sizeof), the name of an array is automatically converted to a pointer to the array’s first element. This implies that an array is never copied as a whole when named as an argument to a function, but rather only the address of its first element is passed. Therefore, although function calls in C use pass-by-value semantics, arrays are in effect passed by reference.

The size of an element can be determined by applying the operator sizeof to any dereferenced element of x, as in n = sizeof *x or n = sizeof x[0], and the number of elements in a declared array A can be determined as sizeof A / sizeof A[0]. The latter only applies to array names: variables declared with subscripts(int A[20]). Due to the semantics of C, it is not possible to determine the entire size of arrays through pointers to arrays or those created by dynamic allocation (malloc); code such as sizeof arr / sizeof arr[0] (where arr = A designates a pointer) will not work since the compiler assumes the size of the pointer itself is being requested.[33][34] Since array name arguments to sizeof are not converted to pointers, they do not exhibit such ambiguity. However, arrays created by dynamic allocation are initialized to pointers rather than true array variables, so they suffer from the same sizeof issues as array pointers.

Thus, despite this apparent equivalence between array and pointer variables, there is still a distinction to be made between them. Even though the name of an array is, in most expression contexts, converted into a pointer (to its first element), this pointer does not itself occupy any storage; the array name is not an l-value, and its address is a constant, unlike a pointer variable. Consequently, what an array “points to” cannot be changed, and it is impossible to assign a new address to an array name. Array contents may be copied, however, by using the memcpy function, or by accessing the individual elements.

Memory management

One of the most important functions of a programming language is to provide facilities for managing memory and the objects that are stored in memory. C provides three distinct ways to allocate memory for objects:[29]

  • Static memory allocation: space for the object is provided in the binary at compile-time; these objects have an extent (or lifetime) as long as the binary which contains them is loaded into memory.
  • Automatic memory allocation: temporary objects can be stored on the stack, and this space is automatically freed and reusable after the block in which they are declared is exited.
  • Dynamic memory allocation: blocks of memory of arbitrary size can be requested at run-time using library functions such as malloc from a region of memory called the heap; these blocks persist until subsequently freed for reuse by calling the library function realloc or free

These three approaches are appropriate in different situations and have various tradeoffs. For example, static memory allocation has little allocation overhead, automatic allocation may involve slightly more overhead, and dynamic memory allocation can potentially have a great deal of overhead for both allocation and deallocation. The persistent nature of static objects is useful for maintaining state information across function calls, automatic allocation is easy to use but stack space is typically much more limited and transient than either static memory or heap space, and dynamic memory allocation allows convenient allocation of objects whose size is known only at run-time. Most C programs make extensive use of all three.

Where possible, automatic or static allocation is usually simplest because the storage is managed by the compiler, freeing the programmer of the potentially error-prone chore of manually allocating and releasing storage. However, many data structures can change in size at runtime, and since static allocations (and automatic allocations before C99) must have a fixed size at compile-time, there are many situations in which dynamic allocation is necessary.[29] Prior to the C99 standard, variable-sized arrays were a common example of this. (See the article on malloc for an example of dynamically allocated arrays.) Unlike automatic allocation, which can fail at run time with uncontrolled consequences, the dynamic allocation functions return an indication (in the form of a null pointer value) when the required storage cannot be allocated. (Static allocation that is too large is usually detected by the linker or loader, before the program can even begin execution.)

Unless otherwise specified, static objects contain zero or null pointer values upon program startup. Automatically and dynamically allocated objects are initialized only if an initial value is explicitly specified; otherwise they initially have indeterminate values (typically, whatever bit pattern happens to be present in the storage, which might not even represent a valid value for that type). If the program attempts to access an uninitialized value, the results are undefined. Many modern compilers try to detect and warn about this problem, but both false positives and false negatives can occur.

Another issue is that heap memory allocation has to be synchronized with its actual usage in any program in order for it to be reused as much as possible. For example, if the only pointer to a heap memory allocation goes out of scope or has its value overwritten before free() is called, then that memory cannot be recovered for later reuse and is essentially lost to the program, a phenomenon known as a memory leak. Conversely, it is possible for memory to be freed but continue to be referenced, leading to unpredictable results. Typically, the symptoms will appear in a portion of the program far removed from the actual error, making it difficult to track down the problem. (Such issues are ameliorated in languages with automatic garbage collection.)


The C programming language uses libraries as its primary method of extension. In C, a library is a set of functions contained within a single “archive” file. Each library typically has a header file, which contains the prototypes of the functions contained within the library that may be used by a program, and declarations of special data types and macro symbols used with these functions. In order for a program to use a library, it must include the library’s header file, and the library must be linked with the program, which in many cases requires compiler flags (e.g., -lm, shorthand for “math library”).[29]

The most common C library is the C standard library, which is specified by the ISO and ANSI C standards and comes with every C implementation. (Implementations which target limited environments such as embedded systems may provide only a subset of the standard library.) This library supports stream input and output, memory allocation, mathematics, character strings, and time values. Several separate standard headers (for example, stdio.h) specify the interfaces for these and other standard library facilities.

Another common set of C library functions are those used by applications specifically targeted for Unix and Unix-like systems, especially functions which provide an interface to the kernel. These functions are detailed in various standards such as POSIX and the Single UNIX Specification.

Since many programs have been written in C, there are a wide variety of other libraries available. Libraries are often written in C because C compilers generate efficientobject code; programmers then create interfaces to the library so that the routines can be used from higher-level languages like JavaPerl, and Python.[29]

Language tools

Tools have been created to help C programmers avoid some of the problems inherent in the language, such as statements with undefined behavior or statements that are not a good practice because they are likely to result in unintended behavior or run-time errors.

Automated source code checking and auditing are beneficial in any language, and for C many such tools exist, such as Lint. A common practice is to use Lint to detect questionable code when a program is first written. Once a program passes Lint, it is then compiled using the C compiler. Also, many compilers can optionally warn about syntactically valid constructs that are likely to actually be errors. MISRA C is a proprietary set of guidelines to avoid such questionable code, developed for embedded systems.

There are also compilers, libraries, and operating system level mechanisms for performing actions that are not a standard part of C, such as array bounds checking,buffer overflow detection, serialization, and automatic garbage collection.

Tools such as Purify or Valgrind and linking with libraries containing special versions of the memory allocation functions can help uncover runtime errors in memory usage.


C is often used for “system programming“, including implementing operating systems and embedded system applications, due to a combination of desirable characteristics such as code portability and efficiency, ability to access specific hardware addresses, ability to pun types to match externally imposed data access requirements, and low run-time demand on system resources. C can also be used for website programming using CGI as a “gateway” for information between the Web application, the server, and the browser.[35] Some reasons for choosing C over interpreted languages are its speed, stability, and near-universal availability.[36]

One consequence of C’s wide availability and efficiency is that compilers, libraries, and interpreters of other programming languages are often implemented in C. The primary implementations of Python (CPython), Perl 5, and PHP are all written in C.

Due to its thin layer of abstraction and low overhead, C allows efficient implementations of algorithms and data structures, which is useful for programs that perform a lot of computations. For example, the GNU Multi-Precision Library, the GNU Scientific LibraryMathematica and MATLAB are completely or partially written in C.

C is sometimes used as an intermediate language by implementations of other languages, sometimes referred to as C intermediate language (CIL). This approach may be used for portability or convenience; by using C as an intermediate language, it is not necessary to develop machine-specific code generators. C has some features, such as line-number preprocessor directives and optional superfluous commas at the end of initializer lists, which support compilation of generated code. However, some of C’s shortcomings have prompted the development of other C-based languages specifically designed for use as intermediate languages, such as C–. Several other tools use CIL as a way to have access to a C abstract syntax tree. Some of these utilities are Frama-c (a framework for analysis of C programs) or Compcert (a C compiler proven in coq). CIL was originally designed and implemented in 2002 by George Necula et al.[37][38]

C has also been widely used to implement end-user applications, but much of that development has shifted to newer languages.

Related languages

C has directly or indirectly influenced many later languages such as C#DGoJavaJavaScriptLimboLPCPerlPHPPython, and Unix’s C Shell. The most pervasive influence has been syntactical: all of the languages mentioned combine the statement and (more or less recognizably) expression syntax of C with type systems, data models and/or large-scale program structures that differ from those of C, sometimes radically.

Several C or near-C interpreters exist, including Ch and CINT, which can also be used for scripting.

When object-oriented languages became popular, C++ and Objective-C were two different extensions of C that provided object-oriented capabilities. Both languages were originally implemented as source-to-source compilers; source code was translated into C, and then compiled with a C compiler.

The C++ programming language was devised by Bjarne Stroustrup as one approach to providing object-oriented functionality with C-like syntax. C++ adds greater typing strength, scoping, and other tools useful in object-oriented programming and permits generic programming via templates. Nearly a superset of C, C++ now supports most of C, with a few exceptions (see Compatibility of C and C++).

Objective-C was originally a very “thin” layer on top of C, and remains a strict superset of C that permits object-oriented programming using a hybrid dynamic/static typing paradigm. Objective-C derives its syntax from both C and Smalltalk: syntax that involves preprocessing, expressions, function declarations, and function calls is inherited from C, while the syntax for object-oriented features was originally taken from Smalltalk.

In addition to C++ and Objective-CChCilk and Unified Parallel C are nearly supersets of C.


C++ is often considered to be a superset of C, but this is not strictly true.[33] Most C code can easily be made to compile correctly in C++, but there are a few differences that cause some valid C code to be invalid or behave differently in C++. For example, C allows implicit conversion from void* to other pointer types, but C++ does not (for type safety reasons). Also, C++ defines many new keywords, such as new and class, which may be used as identifiers (for example, variable names) in a C program.

Some incompatibilities have been removed by the 1999 revision of the C standard (C99), which now supports C++ features such as line comments (//), and declarations mixed with code. On the other hand, C99 introduced a number of new features that C++ did not support, were incompatible or redundant in C++, such as variable-length arrays, native complex-number types (use std::complex class that is, and was also there before C99 existed, in the C++ standard library), designated initializers (use constructors instead), compound literals, the boolean typedef (in C++ it is a fundamental type) and the restrict keyword.[34] Some of the C99-introduced features were included in the subsequent version of the C++ standard, C++11:

C99 preprocessor additions

  • variadic macros
  • concatenation of adjacent narrow/wide string literals
  • _Pragma()
  • long long
  • __func__


  • cstdbool (stdbool.h)
  • cstdint (stdint.h)
  • cinttypes (inttypes.h)

To intermix C and C++ code, any function declaration or definition that is to be called from/used both in C and C++ must be declared with C linkage by placing it within an extern “C” {/*…*/} block. Such a function may not rely on features depending on name mangling (i.e., function overloading).

classes, then virtual functionsoperator overloadingmultiple inheritancetemplates and exception handling

strong typinginlining, and default argument

virtual functions, function name and operator overloading, references, constants, user-controlled free-store memory control, improved type checking, and BCPL style single-line comments with two forward slashes (//), as well as the development of a proper compiler for C++

multiple inheritance, abstract classes, static member functions, const member functions, and protected members.



C++ inherits most of C’s syntax. The following is Bjarne Stroustrup’s version of the Hello world program that uses the C++ Standard Library stream facility to write a message to standard output:[23][24]

Within functions that define a non-void return type, failure to return a value before control reaches the end of the function results in undefined behaviour (compilers typically provide the means to issue a diagnostic in such a case).[25] The sole exception to this rule is the main function, which implicitly returns a value of zero.[26]

Operators and operator overloading

Operators that cannot be overloaded

C++ provides more than 35 operators, covering basic arithmetic, bit manipulation, indirection, comparisons, logical operations and others. Almost all operators can be overloaded for user-defined types, with a few notable exceptions such as member access (. and .*) as well as the conditional operator. The rich set of overloadable operators is central to using user created types in C++ as well and as easily as built in types (so that the user using them cannot tell the difference). The overloadable operators are also an essential part of many advanced C++ programming techniques, such as smart pointers. Overloading an operator does not change the precedence of calculations involving the operator, nor does it change the number of operands that the operator uses (any operand may however be ignored by the operator, though it will be evaluated prior to execution). Overloaded “&&” and “||” operators lose their short-circuit evaluation property.

Memory management

C++ supports four types of memory management:

  • Static memory allocation. A static variable is assigned a value at compile-time, and allocated storage in a fixed location along with the executable code. These are declared with the “static” keyword (in the sense of static storage, not in the sense of declaring a class variable).
  • Automatic memory allocation. An automatic variable is simply declared with its class name, and storage is allocated on the stack when the value is assigned. The constructor is called when the declaration is executed, the destructor is called when the variable goes out of scope, and after the destructor the allocated memory is automatically freed.
  • Dynamic memory allocation. Storage can be dynamically allocated on the heap using manual memory management – normally calls to new and delete (though old-style C calls such as malloc() and free() are still supported).
  • With the use of a library, garbage collection is possible. The Boehm garbage collector is commonly used for this purpose.

The fine control over memory management is similar to C, but in contrast with languages that intend to hide such details from the programmer, such as Java, Perl, PHP, and Ruby.


C++ templates enable generic programming. C++ supports both function and class templates. Templates may be parameterized by types, compile-time constants, and other templates. Templates are implemented by instantiation at compile-time. To instantiate a template, compilers substitute specific arguments for a template’s parameters to generate a concrete function or class instance. Some substitutions are not possible; these are eliminated by an overload resolution policy described by the phrase “Substitution failure is not an error” (SFINAE). Templates are a powerful tool that can be used for generic programmingtemplate metaprogramming, and code optimization, but this power implies a cost. Template use may increase code size, because each template instantiation produces a copy of the template code: one for each set of template arguments, however, this is the same amount of code that would be generated, or less, that if the code was written by hand.[27] This is in contrast to run-time generics seen in other languages (e.g., Java) where at compile-time the type is erased and a single template body is preserved.

Templates are different from macros: while both of these compile-time language features enable conditional compilation, templates are not restricted to lexical substitution. Templates are aware of the semantics and type system of their companion language, as well as all compile-time type definitions, and can perform high-level operations including programmatic flow control based on evaluation of strictly type-checked parameters. Macros are capable of conditional control over compilation based on predetermined criteria, but cannot instantiate new types, recurse, or perform type evaluation and in effect are limited to pre-compilation text-substitution and text-inclusion/exclusion. In other words, macros can control compilation flow based on pre-defined symbols but cannot, unlike templates, independently instantiate new symbols. Templates are a tool for static polymorphism (see below) and generic programming.

In addition, templates are a compile time mechanism in C++ that is Turing-complete, meaning that any computation expressible by a computer program can be computed, in some form, by a template metaprogram prior to runtime.

In summary, a template is a compile-time parameterized function or class written without knowledge of the specific arguments used to instantiate it. After instantiation, the resulting code is equivalent to code written specifically for the passed arguments. In this manner, templates provide a way to decouple generic, broadly applicable aspects of functions and classes (encoded in templates) from specific aspects (encoded in template parameters) without sacrificing performance due to abstraction.


C++ introduces object-oriented programming (OOP) features to C. It offers classes, which provide the four features commonly present in OOP (and some non-OOP) languages: abstractionencapsulationinheritance, and polymorphism. One distinguishing feature of C++ classes compared to classes in other programming languages is support for deterministic destructors, which in turn provide support for the Resource Acquisition is Initialization (RAII) concept.


Encapsulation is the hiding of information to ensure that data structures and operators are used as intended and to make the usage model more obvious to the developer. C++ provides the ability to define classes and functions as its primary encapsulation mechanisms. Within a class, members can be declared as either public, protected, or private to explicitly enforce encapsulation. A public member of the class is accessible to any function. A private member is accessible only to functions that are members of that class and to functions and classes explicitly granted access permission by the class (“friends”). A protected member is accessible to members of classes that inherit from the class in addition to the class itself and any friends.

The OO principle is that all of the functions (and only the functions) that access the internal representation of a type should be encapsulated within the type definition. C++ supports this (via member functions and friend functions), but does not enforce it: the programmer can declare parts or all of the representation of a type to be public, and is allowed to make public entities that are not part of the representation of the type. Therefore, C++ supports not just OO programming, but other weaker decomposition paradigms, like modular programming.

It is generally considered good practice to make all data private or protected, and to make public only those functions that are part of a minimal interface for users of the class. This can hide the details of data implementation, allowing the designer to later fundamentally change the implementation without changing the interface in any way.[28][29]


Inheritance allows one data type to acquire properties of other data types. Inheritance from a base class may be declared as public, protected, or private. This access specifier determines whether unrelated and derived classes can access the inherited public and protected members of the base class. Only public inheritance corresponds to what is usually meant by “inheritance”. The other two forms are much less frequently used. If the access specifier is omitted, a “class” inherits privately, while a “struct” inherits publicly. Base classes may be declared as virtual; this is called virtual inheritance. Virtual inheritance ensures that only one instance of a base class exists in the inheritance graph, avoiding some of the ambiguity problems of multiple inheritance.

Multiple inheritance is a C++ feature not found in most other languages, allowing a class to be derived from more than one base class; this allows for more elaborate inheritance relationships. For example, a “Flying Cat” class can inherit from both “Cat” and “Flying Mammal”. Some other languages, such as C# or Java, accomplish something similar (although more limited) by allowing inheritance of multiple interfaces while restricting the number of base classes to one (interfaces, unlike classes, provide only declarations of member functions, no implementation or member data). An interface as in C# and Java can be defined in C++ as a class containing only pure virtual functions, often known as an abstract base class or “ABC”. The member functions of such an abstract base class are normally explicitly defined in the derived class, not inherited implicitly. C++ virtual inheritance exhibits an ambiguity resolution feature called dominance.


Polymorphism enables one common interface for many implementations, and for objects to act differently under different circumstances.

C++ supports several kinds of static (compile-time) and dynamic (run-timepolymorphisms. Compile-time polymorphism does not allow for certain run-time decisions, while run-time polymorphism typically incurs a performance penalty.

Static polymorphism

Function overloading allows programs to declare multiple functions having the same name (but with different arguments). The functions are distinguished by the number or types of their formal parameters. Thus, the same function name can refer to different functions depending on the context in which it is used. The type returned by the function is not used to distinguish overloaded functions and would result in a compile-time error message.

When declaring a function, a programmer can specify for one or more parameters a default value. Doing so allows the parameters with defaults to optionally be omitted when the function is called, in which case the default arguments will be used. When a function is called with fewer arguments than there are declared parameters, explicit arguments are matched to parameters in left-to-right order, with any unmatched parameters at the end of the parameter list being assigned their default arguments. In many cases, specifying default arguments in a single function declaration is preferable to providing overloaded function definitions with different numbers of parameters.

Templates in C++ provide a sophisticated mechanism for writing generic, polymorphic code. In particular, through the Curiously Recurring Template Pattern, it’s possible to implement a form of static polymorphism that closely mimics the syntax for overriding virtual functions. Because C++ templates are type-aware and Turing-complete, they can also be used to let the compiler resolve recursive conditionals and generate substantial programs through template metaprogramming. Contrary to some opinion, template code will not generate a bulk code after compilation with the proper compiler settings.[27]

Dynamic polymorphism


Variable pointers (and references) to a base class type in C++ can refer to objects of any derived classes of that type in addition to objects exactly matching the variable type. This allows arrays and other kinds of containers to hold pointers to objects of differing types. Because assignment of values to variables usually occurs at run-time, this is necessarily a run-time phenomenon.

C++ also provides a dynamic_cast operator, which allows the program to safely attempt conversion of an object into an object of a more specific object type (as opposed to conversion to a more general type, which is always allowed). This feature relies on run-time type information (RTTI). Objects known to be of a certain specific type can also be cast to that type with static_cast, a purely compile-time construct that has no runtime overhead and does not require RTTI.

Virtual member functions

Ordinarily, when a function in a derived class overrides a function in a base class, the function to call is determined by the type of the object. A given function is overridden when there exists no difference in the number or type of parameters between two or more definitions of that function. Hence, at compile time, it may not be possible to determine the type of the object and therefore the correct function to call, given only a base class pointer; the decision is therefore put off until runtime. This is calleddynamic dispatchVirtual member functions or methods[30] allow the most specific implementation of the function to be called, according to the actual run-time type of the object. In C++ implementations, this is commonly done using virtual function tables. If the object type is known, this may be bypassed by prepending a fully qualified class name before the function call, but in general calls to virtual functions are resolved at run time.

In addition to standard member functions, operator overloads and destructors can be virtual. A general rule of thumb is that if any functions in the class are virtual, the destructor should be as well. As the type of an object at its creation is known at compile time, constructors, and by extension copy constructors, cannot be virtual. Nonetheless a situation may arise where a copy of an object needs to be created when a pointer to a derived object is passed as a pointer to a base object. In such a case, a common solution is to create a clone() (or similar) virtual function that creates and returns a copy of the derived class when called.

A member function can also be made “pure virtual” by appending it with = 0 after the closing parenthesis and before the semicolon. A class containing a pure virtual function is called an abstract data type. Objects cannot be created from abstract data types; they can only be derived from. Any derived class inherits the virtual function as pure and must provide a non-pure definition of it (and all other pure virtual functions) before objects of the derived class can be created. A program that attempts to create an object of a class with a pure virtual member function or inherited pure virtual member function is ill-formed.

Exception handling

Exception handling is a mechanism in C++ that is used to handle errors in a uniform manner and separately from the main body of a programme’s source code. Should an error occur, an exception is thrown (raised), which is then caught by an exception handler. The code that might cause an exception to be thrown goes in a try block (is enclosed in try { and }) and the exceptions are handled in separate catch blocks.

Unknown errors can be caught by the handlers using catch(...) to catch all exceptions. As all the standard library exceptions have std::exception as their base class, catching std::exception will catch all standard library exceptions. As is shown above (e.what()), all exceptions that derive from std::exception have awhat() function that provides information about what caused the error – std::out_of_range is a descendent of std::exception.

Each try block can have multiple handlers, allowing multiple different exceptions to be potentially caught.

Standard library

The C++ standard consists of two parts: the core language and the C++ Standard Library; which C++ programmers expect on every major implementation of C++, it includes vectors, lists, maps, algorithms (find, for_each, binary_search, random_shuffle, etc.), sets, queues, stacks, arrays, tuples, input/output facilities (iostream; reading from the console input, reading/writing from files), smart pointers for automatic memory management, regular expression support, multi-threading library, atomics support (allowing a variable to be read or written to be at most one thread at a time without any external synchronisation), time utilities (measurement, getting current time, etc.), a system for converting error reporting that doesn’t use C++ exceptions into C++ exceptions, a random number generator and a slightly modified version of the C standard library (to make it comply with the C++ type system).

A large part of the C++ library is based on the STL. This provides useful tools as containers (for example vectors and lists), iterators to provide these containers with array-like access and algorithms to perform operations such as searching and sorting. Furthermore (multi)maps (associative arrays) and (multi)sets are provided, all of which export compatible interfaces. Therefore it is possible, using templates, to write generic algorithms that work with any container or on any sequence defined by iterators. As in C, the features of the library are accessed by using the #include directive to include a standard header. C++ provides 105 standard headers, of which 27 are deprecated.

The standard incorporates the STL was originally designed by Alexander Stepanov, who experimented with generic algorithms and containers for many years. When he started with C++, he finally found a language where it was possible to create generic algorithms (e.g., STL sort) that perform even better than, for example, the C standard library qsort, thanks to C++ features like using inlining and compile-time binding instead of function pointers. The standard does not refer to it as “STL”, as it is merely a part of the standard library, but the term is still widely used to distinguish it from the rest of the standard library (input/output streams, internationalization, diagnostics, the C library subset, etc.).

Most C++ compilers, and all major ones, provide a standards conforming implementation of the C++ standard library.


Throughout C++’s life, its development and evolution has been informally governed by a set of rules that its evolution should follow:[9]

  • It must be driven by actual problems and its features should be useful immediately in real world programmes.
  • Every feature should be implementable (with a reasonably obvious way to do so).
  • Programmers should be free to pick their own programming style, and that style should be fully supported by C++.
  • Allowing a useful feature is more important than preventing every possible misuse of C++.
  • It should provide facilities for organising programmes into well defined separate parts, and provide facilities for combining separately developed parts.
  • No implicit violations of the type system (but allow explicit violations that have been explicitly asked for by the programmer).
  • Make user created types have equal support and performance to built in types.
  • Any features that you do not use you do not pay for (e.g. in performance).
  • There should be no language beneath C++ (except assembly language).
  • C++ should work alongside other pre-existing programming languages, rather than being part of its own separate and incompatible programming environment.
  • If what the programmer wants to do is unknown, allow the programmer to specify (provide manual control).

The latest major revision of the C++ standard, C++11 (formerly known as C++0x), was approved by ISO/IEC on 12 August 2011.[20] It has been published as 14882:2011.[21]

C++14 or C++1y are names being used for the next minor revision. It is planned to be a small extension over C++11, featuring mainly bug fixes and small improvements, similarly to how C++03 was a small extension to C++98.[22] While the name ‘C++14’ implies a release in 2014, this date is not fixed.

The subsequent major revision, informally known as C++17, is planned for 2017.[22]


Producing a reasonably standards-compliant C++ compiler has proven to be a difficult task for compiler vendors in general. For many years, different C++ compilers implemented the C++ language to different levels of compliance to the standard, and their implementations varied widely in some areas such as partial template specialization. Recent releases of most popular C++ compilers support almost all of the C++ 1998 standard.[31]

To give compiler vendors greater freedom, the C++ standards committee decided not to dictate the implementation of name manglingexception handling, and other implementation-specific features. The downside of this decision is that object code produced by different compilers is expected to be incompatible. There were, however, attempts to standardize compilers for particular machines or operating systems (for example C++ ABI),[32] though they seem to be largely abandoned now.


How do I write this very simple program?

Often, especially at the start of semesters, I get a lot of questions about how to write very simple programs. Typically, the problem to be solved is to read in a few numbers, do something with them, and write out an answer. Here is a sample program that does that:

Can I call a virtual function from a constructor?

Yes, but be careful. It may not do what you expect. In a constructor, the virtual call mechanism is disabled because overriding from derived classes hasn’t yet happened. Objects are constructed from the base up, “base before derived”.


the program compiles and produce

Note not D::f. Consider what would happen if the rule were different so that D::f() was called from B::B(): Because the constructor D::D() hadn’t yet been run, D::f() would try to assign its argument to an uninitialized string s. The result would most likely be an immediate crash.

Destruction is done “derived class before base class”, so virtual functions behave as in constructors: Only the local definitions are used – and no calls are made to overriding functions to avoid touching the (now destroyed) derived class part of the object.

For more details see D&E or TC++PL3 15.4.3.

It has been suggested that this rule is an implementation artifact. It is not so. In fact, it would be noticeably easier to implement the unsafe rule of calling virtual functions from constructors exactly as from other functions. However, that would imply that no virtual function could be written to rely on invariants established by base classes. That would be a terrible mess.

Does “friend” violate encapsulation?

No. It does not. “Friend” is an explicit mechanism for granting access, just like membership. You cannot (in a standard conforming program) grant yourself access to a class without modifying its source. For example:

For a description on the C++ protection model, see D&E sec 2.10 and TC++PL sec 11.5, 15.3, and C.11.

Why do I have to put the data in my class declarations?

You don’t. If you don’t want data in an interface, don’t put it in the class that defines the interface. Put it in derived classes instead. See, Why do my compiles take so long?.

Sometimes, you do want to have representation data in a class. Consider class complex:

This type is designed to be used much as a built-in type and the representation is needed in the declaration to make it possible to create genuinely local objects (i.e. objects that are allocated on the stack and not on a heap) and to ensure proper inlining of simple operations. Genuinely local objects and inlining is necessary to get the performance of complex close to what is provided in languages with a built-in complex type.