In 976 AD the Persian encyclopedist Muhammad ibn Ahmad al-Khwarizmi, in his “Keys of the Sciences”, remarked that if, in a calculation, no number appears in the place of tens, then a little circle should be used “to keep the rows”. This circle was called صفر (ṣifr, “empty”) in Arabic language. That was the earliest mention of the name ṣifr that eventually became zero.
generation of space as a measurable relation (variable). positive/negative. structure of vocab. organic or formal. encapsulation. space is information. Imperative, procedural, and declarative. personal threads.
Trygve Mikkjel Heyerdahl Reenskaug:
- To help software developers reason about system-level state and behavior instead of only object state and behavior;
How does E.T. feel about ROM?
Graphicality is neither essential nor incidental—it is a convenience for making relations legible, available to perception, and analysis. The graphic field also provides material evidence for analysis.
-Johanna Drucker, http://archivingcultures.org/mot/451
- “hello world” = I/O.
- sensory and binary function
- function and use
- data and tools
Cues do not need to be related to the action (or relation) (mailbox/letter):
- knotted handkerchiefs
- string around the finger
Visual art, visual representation:
- to today’s viewer/user
- to the viewer/user at the time
- between the artist/dev and the work (and interface to viewer)
- between the characters/code/behavior in each piece (abstract or figurative)
- A kind of rich “database”. Nested and very deep sets of common, learned and memorized relations
- shape: edge/distinction
- implied: order, high/low, top/below, adjacent/spaced, to edge/off edge, soft adjacent/hard adjacent, etc.
- content: location/place/situation (compound order, foreground/background, light within dark/dark within light, recognition of form, composited form, form with “name”)
- meaning: refined figurative relations (things, objects, people, roles) also related to physical space
- conceptual: between concept of developer and user
“…pictures are graphical but they don’t work in the same sense as diagrams do.”
…Sometimes a single line, sometimes a double line, these dividing rules segmented the clay surface into bounded units. Like property lines or fences, the divisions maintained distinctions among different types of information that comprised the written record on the tablet. Quantities might be separated from names for things, or, in the more elaborate column structures of inventories, owners from entities, and so on. The performative character of those lines is echoed in the columnar structure of accounting balance sheets and the marshalling of entries into their proper arrangement for purposes of tracking sums and values, names, or other items. The temptation to slip from the description of content typing that made those clay tablet grids work so effectively to the analysis of database structures is, of course, irresistible, and not without justification. So the arc of this discussion will sweep broadly. With all due acknowledgement of the specificity due any instantiation of a particular graphical format or feature, the crux of my argument lies on demonstrating some of the similarities and continuities of the basic elements of diagrammatic writing.
Bertalanffy: “allowed to behave”?
Herman Hollerith (February 29, 1860 – November 17, 1929) was an American statisticianandinventor who developed a mechanical tabulator based on punched cards to rapidly tabulate statistics from millions of pieces of data. He was the founder of the Tabulating Machine Company that later merged to become IBM. Hollerith is widely regarded as the father of modern machine data processing. With his invention of the punched card evaluating machine the beginning of the era of automatic data processing systems was marked. His draft of this concept dominated the computing landscape for nearly a century.
Hollerith had left teaching and begun working for the United States Census Bureau in the year he filed his first patent application. Titled “Art of Compiling Statistics”, it was filed on September 23, 1884; U.S. Patent 395,782 was granted on January 8, 1889.
A description of this system, An Electric Tabulating System (1889), was submitted by Hollerith to Columbia University as his doctoral thesis, and is reprinted in Randell’s book. On January 8, 1889, Hollerith was issued U.S. Patent 395,782, claim 2 of which reads:
The herein-described method of compiling statistics, which consists in recording separate statistical items pertaining to the individual by holes or combinations of holes punched in sheets of electrically non-conducting material, and bearing a specific relation to each other and to a standard, and then counting or tallying such statistical items separately or in combination by means of mechanical counters operated by electro-magnets the circuits through which are controlled by the perforated sheets, substantially as and for the purpose set forth.
He eventually moved to Washington, D.C., living in Georgetown, with a home on 29th Street and a factory for manufacturing his tabulating machines at 31st Street and the C&O Canal, where today there is a commemorative plaque installed by IBM.
Electrical tabulation of data
Hollerith tabulating machine and sorting box.
The previous 1880 census had taken eight years. In 1896 Hollerith started his own business when he founded the Tabulating Machine Company.
- first automatic card-feed mechanism
- first keypunch (that is, a punch operated by a keyboard); a skilled operator could punch 200–300 cards per hour.
- He also invented a tabulator
- A plugboard control panel in his 1906 Type I Tabulator allowed it to do different jobs without being rebuilt (the first step towards programming). These inventions were among the foundations of the modern information processing industry and Hollerith’s punchcards (though later adapted to encode computer programs) continued in use for almost a century.
The 1890 Tabulator was hardwired to operate only on 1890 Census cards.
In 1911 four corporations, including Hollerith’s firm, merged to form the Computing Tabulating Recording Company (CTR). Under the presidency of Thomas J. Watson, it was renamed International Business Machines Corporation (IBM) in 1924.
Hollerith cards were named after the elder Herman Hollerith, as were Hollerith constants (also sometimes called Hollerith strings), an early type of string constant declaration (in computer programming).
The first design for a program-controlled computer was Charles Babbage‘s Analytical Engine in the 1830s. A century later, in 1936, mathematician Alan Turing published his description of what became known as a Turing machine, a theoretical concept intended to explore the limits of mechanical computation. Turing was not imagining a physical machine, but a person he called a “computer”, who acted according to the instructions provided by a tape on which symbols could be read and written sequentially as the tape moved under a tape head. Turing proved that if an algorithm can be written to solve a mathematical problem, then a Turing machine can execute that algorithm.
Konrad Zuse‘s Z3 was the world’s first working programmable, fully automatic computer, with binary digital arithmetic logic, but it lacked the conditional branching of a Turing machine. On 12 May 1941, it was successfully presented to an audience of scientists of the Deutsche Versuchsanstalt für Luftfahrt (“German Laboratory for Aviation”) in Berlin. The Z3 stored its program on an external tape, but it was electromechanical rather than electronic. The Colossus of 1943 was the first electronic computing device, but it was not a general-purpose machine.
Design of the von Neumann architecture (1947)
The construction of a von Neumann computer depended on the availability of a suitable memory device on which to store the program. During the Second World War researchers working on the problem of removing the clutter from radar signals had developed a form of delay line memory, the first practical application of which was the mercury delay line, developed by J. Presper Eckert. Radar transmitters send out regular brief pulses of radio energy, the reflections from which are displayed on a CRT screen. As operators are usually interested only in moving targets, it was desirable to filter out any distracting reflections from stationary objects. The filtering was achieved by comparing each received pulse with the previous pulse, and rejecting both if they were identical, leaving a signal containing only the images of any moving objects. To store each received pulse for later comparison it was passed through a transmission line, delaying it by exactly the time between transmitted pulses.
Turing joined the National Physical Laboratory (NPL) in October 1945, by which time scientists within the Ministry of Supply had concluded that Britain needed a National Mathematical Laboratory to coordinate machine-aided computation. A Mathematics Division was set up at the NPL, and on 19 February 1946 Alan Turing presented a paper outlining his design for an electronic stored-program computer to be known as the Automatic Computing Engine (ACE). This was one of several projects set up in the years following the Second World War with the aim of constructing a stored-program computer. At about the same time, EDVAC was under development at the University of Pennsylvania‘s Moore School of Electrical Engineering, and the University of Cambridge Mathematical Laboratory was working on EDSAC.
The NPL did not have the expertise to build a machine like ACE, so they contacted Tommy Flowers at the General Post Office‘s (GPO) Dollis Hill Research Laboratory. Flowers, the designer of Colossus, the world’s first programmable electronic computer, was committed elsewhere and was unable to take part in the project, although his team did build some mercury delay lines for ACE. The Telecommunications Research Establishment (TRE) was also approached for assistance, as was Maurice Wilkes at the University of Cambridge Mathematical Laboratory.
The government department responsible for the NPL decided that, of all the work being carried out by the TRE on its behalf, ACE was to be given the top priority. NPL’s decision led to a visit by the superintendent of the TRE’s Physics Division on 22 November 1946, accompanied by Frederic C. Williams and A. M. Uttley, also from the TRE. Williams led a TRE development group working on CRT stores for radar applications, as an alternative to delay lines. He had already accepted a professorship at the University of Manchester, and most of his circuit technicians were in the process of being transferred to the Department of Atomic Energy. The TRE agreed to second a small number of technicians to work under Williams’ direction at the university, and to support another small group working with Uttley at the TRE.
The Manchester Small-Scale Experimental Machine (SSEM), nicknamed Baby, was the world’s first stored-program computer. It was built at the Victoria University of Manchester by Frederic C. Williams, Tom Kilburn and Geoff Tootill, and ran its first program on 21 June 1948.
The machine was not intended to be a practical computer but was instead designed as a testbed for the Williams tube, an early form of computer memory. Although considered “small and primitive” by the standards of its time, it was the first working machine to contain all of the elements essential to a modern electronic computer. As soon as the SSEM had demonstrated the feasibility of its design, a project was initiated at the university to develop it into a more usable computer, the Manchester Mark 1. The Mark 1 in turn quickly became the prototype for the Ferranti Mark 1, the world’s first commercially available general-purpose computer.
The SSEM had a 32-bit word length and a memory of 32 words. As it was designed to be the simplest possible stored-program computer, the only arithmetic operations implemented in hardware were subtraction and negation; other arithmetic operations were implemented in software. The first of three programs written for the machine found the highest proper divisor of 218 (262,144), a calculation that was known would take a long time to run—and so prove the computer’s reliability—by testing every integer from 218 − 1 downwards, as division was implemented by repeated subtraction of the divisor. The program consisted of 17 instructions and ran for 52 minutes before reaching the correct answer of 131,072, after the SSEM had performed 3.5 million operations (for an effective CPU speed of 1.1 kIPS).
Development and design
Architectural schematic showing how the four cathode ray tubes (shown in green) were deployed
By June 1948 the SSEM had been built and was working. It was 17 feet (5.2 m) in length, 7 feet 4 inches (2.24 m) tall, and weighed almost 1 long ton (1.0 t). The machine contained 550 valves – 300 diodes and 250 pentodes – and had a power consumption of 3500 watts.
The output CRT is immediately above the input device, flanked by the monitor and control electronics.
Each 32-bit word of RAM could contain either a program instruction or data. In a program instruction, bits 0–12 represented the memory address of the operand to be used, and bits 13–15 specified the operation to be executed, such as storing a number in memory; the remaining 16 bits were unused. The SSEM’s single operand architecture meant that the second operand of any operation was implicit: the accumulator or the program counter (instruction address); program instructions specified only the address of the data in memory.
A word in the computer’s memory could be read, written, or refreshed, in 360 microseconds. An instruction took four times as long to execute as accessing a word from memory, giving an instruction execution rate of about 700 per second. The main store was refreshed continuously, a process which took 20 milliseconds to complete, as each of the SSEM’s 32 words had to be read and then refreshed in sequence.
The SSEM represented negative numbers using two’s complement, as most computers still do. In that representation, the value of the most significant bit denotes the sign of a number; positive numbers have a zero in that position and negative numbers a one. Thus the range of numbers that could be held in each 32-bit word was −231 to +231 − 1 (decimal: −2,147,483,648 to +2,147,483,647).
Von Neumann showed how the combination of instructions and data in one memory could be used to implement loops, by modifying branch instructions when a loop was completed, for example. The resultant demand that instructions and data be placed on the same memory later came to be known as the Von Neumann Bottleneck.
Three programs were written for the computer. The first, consisting of 17 instructions, was written by Kilburn, and so far as can be ascertained first ran on 21 June 1948. It was designed to find the highest proper factor of 218 (262,144) by trying every integer from 218 − 1 downwards. The divisions were implemented by repeated subtractions of the divisor. The SSEM took 3.5 million operations and 52 minutes to produce the answer (131,072). The program used eight words of working storage in addition to its 17 words of instructions, giving a program size of 25 words.
Geoff Tootill wrote an amended version of the program the following month, and in mid-July Alan Turing—who had been appointed as a reader in the mathematics department at Manchester University in September 1948—submitted the third program, to carry out long division. Turing had by then been appointed to the nominal post of Deputy Director of the Computing Machine Laboratory at the University, although the laboratory did not become a physical reality until 1951.
It was an asynchronous machine, meaning that there was no central clock regulating the timing of the instructions. One instruction started executing when the previous one finished. The addition time was 62 microseconds and the multiplication time was 713 microseconds.
The SSEM’s three bit instruction set allowed a maximum of eight (23) different instructions. In contrast to the modern convention, the machine’s storage was arranged with the least significant digits to the left; thus a one was represented in three bits as “100”, rather than the more conventional “001”.
|BINARY CODE||ORIGINAL NOTATION||MODERN MNEMONIC||OPERATION|
|000||S, Cl||JMP S||Jump to the instruction at the address obtained from the specified memory address S[t 1] (absolute unconditional jump)|
|100||Add S, Cl||JRP S||Jump to the instruction at the program counter plus (+) the relative value obtained from the specified memory address S[t 1] (relative unconditional jump)|
|010||-S, C||LDN S||Take the number from the specified memory address S, negate it, and load it into the accumulator|
|110||c, S||STO S||Store the number in the accumulator to the specified memory address S|
|SUB S||SUB S||Subtract the number at the specified memory address S from the value in accumulator, and store the result in the accumulator|
|011||Test||CMP||Skip next instruction if the accumulator contains a negative value|
The awkward negative operations were a consequence of the SSEM’s lack of hardware to perform any arithmetic operations except subtraction and negation. It was considered unnecessary to build an adder before testing could begin as addition can easily be implemented by subtraction, i.e. x+y can be computed as −(−x−y). Therefore adding two numbers together, X and Y, required four instructions:
1234 LDN X // load negative X into the accumulatorSUB Y // subtract Y from the value in the accumulatorSTO S // store the result at SLDN S // load negative value at S into the accumulator
Programs were entered in binary form by stepping through each word of memory in turn, and using a set of 32 switches known as the input device to set the value of each bit of each word to either 0 or 1. The SSEM had no paper-tape reader or punch.
Although early computers such as CSIRAC made successful use of mercury delay line memory, the technology had several drawbacks; it was heavy, it was expensive, and it did not allow data to be accessed randomly. In addition, because data was stored as a sequence of acoustic waves propagated through a mercurycolumn, the device’s temperature had to be very carefully controlled, as the velocity of sound through a medium varies with its temperature. Williams had seen an experiment at Bell Labs demonstrating the effectiveness of cathode ray tubes (CRT) as an alternative to the delay line for removing ground echoes from radar signals. While working at the TRE, shortly before he joined the University of Manchester in December 1946, he and Tom Kilburn had developed a form of electronic memory known as the Williams or Williams-Kilburn tube based on a standard CRT, the first random-access digital storage device. The Manchester Small-Scale Experimental Machine (SSEM) was designed to show that the system was a practical storage device, by testing that data held within it could be read and written at the speed necessary for use in a computer.
For use in a binary digital computer, the tube had to be capable of storing either one of two states at each of its memory locations, corresponding to the binary digits (bits) 0 and 1. It exploited the positive or negative electrostatic charge generated by displaying either a dash or a dot at any position on the CRT screen, a phenomenon known as secondary emission. A dash generated a positive charge, and a dot a negative charge, either of which could be picked up by a detector plate in front of the screen; a negative charge represented 0, and a positive charge 1. The charge dissipated in about 0.2 seconds, but it could be automatically refreshed from the data picked up by the detector.
The Williams tube was initially based on the CV1131, a commercially available 12-inch (300 mm) diameter CRT, but a smaller 6-inch (150 mm) tube, the CV1097, was used in the SSEM.
The ENIAC (1946) was the first machine that was both electronic and general purpose. It was Turing complete, with conditional branching, and programmable to solve a wide range of problems, but its program was held in the state of switches in patchcords, not in memory, and it could take several days to reprogram. Researchers such as Turing and Konrad Zuse investigated the idea of using the computer’s memory to hold the program as well as the data it was working on, but it was mathematician John von Neumann who became widely credited with defining that computer architecture, still used in almost all computers.
Detail of the back of a section of ENIAC, showing vacuum tubes
Physically, the computer comprised the following components:
- a magnetic tape reader-recorder (Wilkes 1956:36 describes this as a wire recorder.)
- a control unit with an oscilloscope
- a dispatcher unit to receive instructions from the control and memory and direct them to other units
- a computational unit to perform arithmetic operations on a pair of numbers at a time and send the result to memory after checking on a duplicate unit
- a timer
- a dual memory unit consisting of two sets of 64 mercury acoustic delay lines of eight words capacity on each line
- three temporary tanks each holding a single word
- function tables
- master programmer
Holberton was born Frances Elizabeth Snyder in Philadelphia in 1917. On her first day of classes at the University of Pennsylvania/a>, Holberton’s math professor asked her if she wouldn’t be better off at home raising children. Instead, Holberton decided to study journalism, because its curriculum let her travel far a-field. Journalism was also one of the few fields open to women as a career in the 1940s.
During World War II while the men were fighting, the Army needed the women to compute ballistics trajectories. Holberton was hired by the Moore School of Engineering to work as a “computor”, and was soon chosen to be one of the six women to program the ENIAC. Classified as “subprofessionals”, Holberton, along with Kay McNulty, Marlyn Wescoff, Ruth Lichterman,Betty Jean Jennings, and Fran Bilas, programmed the ENIAC to perform calculations for ballistics trajectories electronically for the Ballistic Research Laboratory (BRL), US Army.
Their work on ENIAC earned each of them a place in the Women in Technology International Hall of Fame. In the beginning, because the ENIAC was classified, the women were only allowed to work with blueprints and wiring diagrams in order to program it. The ENIAC was unveiled on February 15, 1946, at the University of Pennsylvania. It had cost almost $500,000.
She also wrote the first generative programming system (SORT/MERGE), and wrote the first statistical analysis package, which was used for the 1950 US Census.
Holberton worked with John Mauchly to develop the C-10 instruction for BINAC, which is considered to be the prototype of all modern programming languages. She also participated in the development of early standards for the COBOL and FORTRAN programming languages with Grace Hopper. Later, as an employee of the National Bureau of Standards, she was very active in the first two revisions of the Fortran language standard (“FORTRAN 77” and “Fortran 90”).
She helped to develop the UNIVAC, designing control panels that put the numeric keypad next to the keyboard and persuading engineers to replace the Univac’s black exterior with the gray-beige tone that came to be the universal color of computers.
Dudley Allen Buck
Dr. Dudley Allen Buck (1927–1959) was an electrical engineer and inventor of components for high-speed computing devices in the 1950s. He is best known for invention of the cryotron, a superconductive computer component that is operated in liquid helium at a temperature near absolute zero. Other inventions were ferroelectric memory, content addressable memory, non-destructive sensing of magnetic fields, and, development of writing printed circuits with a beam of electrons.
The basic idea for the cryotron was entered into his MIT notebook on December 15, 1953. By 1955, Buck was building practical cryotron devices with niobium and tantalum. The cryotron was a great breakthrough in the size of electronic computer elements. In the next decade, cryotron research at other laboratories resulted in the invention of the Crowe Cell at IBM, theJosephson Junction, and the SQUID. Those inventions have today made possible the mapping of brain activity by magnetoencephalography. Despite the need for liquid helium, cryotrons were expected to make computers so small, that in 1957, Life Magazine displayed a full-pagephotograph of Dudley Buck with a cryotron in one hand and a vacuum tube in the other.
Another key invention by Dr. Buck was a method of non-destructive sensing of magnetic materials. In the process of reading data from a typical magnetic core memory, the contents of the memory are erased, making it necessary to take additional time to re-write the data back into the magnetic storage. By design of ‘quadrature sensing’ of magnetic fields, the state of magnetism of the core may be read without alteration, thus eliminating the extra time required to re-write memory data.
Dudley Buck invented recognition unit memory. Also called content addressable memory, it is a technique of storing and retrieving data in which there is no need to know the location of that data. Not only is there no need to query an index for the location of data, the inquiry for data is broadcast to all memory elements simultaneously; thus data retrieval time is independent of the size of the database.
FeRAM was first built by Buck as part of his thesis work in 1952. In addition to its use as computer memory, ferroelectric materials can be used to build shift registers, logic, and amplifiers. Buck showed that a ferroelectric switch could be useful to perform memory addressing.
Diagram of a DC SQUID. The current enters and splits into the two paths, each with currents and . The thin barriers on each path are Josephson junctions, which together separate the two superconducting regions. represents the magnetic flux threading the DC SQUID loop.
Electrical schematic of a SQUID where Ib is the bias current, I0 is the critical current of the SQUID, is the flux threading the SQUID and is the voltage response to that flux. The X-symbols represent Josephson junctions.
As a professor at the Massachusetts Institute of Technology, Dr. Buck earned a Doctor of Science from M.I.T. in 1958. Buck began as a research assistant while a graduate student at MIT in 1950. His first assignment was on the I/O systems of the Whirlwind (computer). He was assigned to work with another graduate student, William N. Papian, to work with various manufacturers developing the ferrite materials to be used in Coincident Current Magnetic core memory.
Buck completed his S.M degree in 1952 at MIT. His thesis for the degree was Ferroelectrics for Digital Information Storage and Switching. The thesis was supervised by Arthur R. von Hippel. In this work he demonstrated the principles of storing data in ferroelectric materials; the earliest demonstration of Ferroelectric memory, or FeRAM. This work also demonstrated that ferroelectric materials could be used as voltage controlled switches to address memory,whereas close friend and fellow student Ken Olsen‘s saturable switch used ferrites and was a current operated switch.
In late 1951 Dudley Buck proposed computer circuits that used neither vacuum tubes, nor the recently invented transistor. It is possible to make all computer logic circuits, including shift registers, counters, and accumulators using only magnetic cores, wire and diodes. Magnetic logic was used in the KW-26 cryptographic communications system, and in the BOGARTcomputer.
By 1957, Buck began to place more emphasis on miniaturization of cryotron systems. The speed that cryotron devices could attain is greater as size of the device is reduced. Dr. Buck, his students, and researcher Kenneth R. Shoulders made great progress manufacturing thin-film cryotron integrated circuits in the laboratory at MIT. Developments included the creation of oxide layers as insulation and for mechanical strength by electron beam reduction of chemicals. This work, co-authored with Kenneth Shoulders, was published as “An Approach to Microminiature Printed Systems”. It was presented in December, 1958, at the Eastern Joint Computer Conference in Philadelphia.
Dudley A. Buck was born in San Francisco, California on April 25, 1927. Dudley and his siblings moved to Santa Barbara, California, in 1940. In 1943 Dudley Buck earned his Amateur Radio License W6WCK and a First Class Radiotelephone Operator license for commercial work. He worked part-time at Santa Barbara radio station KTMS until he left to attend college.
After graduation from University of Washington, Buck served in the U.S. Navy for two years at Nebraska Avenue in Washington, D.C. He entered the reserves in 1950 and then began his career at Massachusetts Institute of Technology. Per a request by chairman Dr. Louis Ridenour, Solomon Kullback appointed Buck to the National Security Agency Scientific Advisory Board Panel on Electronics and Data Processing in December, 1958.
MOS Technology, Inc. (“MOS” being short for Metal Oxide Semiconductor), also known asCSG (Commodore Semiconductor Group), was a semiconductor design and fabricationcompany based in Norristown, Pennsylvania, in the United States.
Originally started in 1969 by Allen-Bradley to provide a second source for electronic calculators and their chips designed by Texas Instruments(TI). In the early 1970s TI decided to release their own line of calculators, instead of selling just the chips inside them, and introduced them at a price that was lower than the price of the chipset alone. Many early chip companies were wiped out in the aftermath; those that survived did so by finding other chips to produce.
At the time there was no such thing as a “design-only” firm (known as a fabless semiconductor company today), so they had to join a chip-building company to produce their new CPU.
Things changed dramatically in 1975. Several of the designers of the Motorola 6800 left the company shortly after its release, after management told them to stop working on a low-cost version of the design.
MOS’s engineers had learned the trick of fixing their masks after they were made. This allowed them to correct the major flaws in a series of small fixes, eventually producing a mask with a very low flaw rate.
Early runs of a new CPU design—what would become the 6502—were achieving a success rate of 70 percent or better. This meant that not only were its designs faster, they cost much less as well.
MOS had started selling the 6502, a chip capable of operating at 1 MHz in September 1975 for a mere $25 USD.
The 6502 was so cheap that many people believed it was a scam when MOS first showed it at a 1975 trade show. They were not aware of MOS’s masking techniques and when they calculated the price per chip at the current industry yield rates, it did not add up. But any hesitation to buy it evaporated when both Motorola and Intel dropped the prices on their own designs from $179 to $69 at the same show in order to compete. Their moves legitimized the 6502, and by the show’s end, the wooden barrel full of samples was empty.
It was nearly identical to the 6501, with only a few minor differences:
- an added on-chip clock oscillator,
- a different functional pinout arrangement,
- generation of the SYNC signal (supporting single-instruction stepping)
- removal of data bus enablement control signals (DBE and BA, with the former directly connected to the phase 2 clock instead).
It outperformed the more-complex 6800 and Intel 8080, but cost much less and was easier to work with.
Although it did not have the 6501’s advantage of being able to be used in place of the Motorola 6800 in existing hardware, it was so inexpensive that it quickly became more popular than the 6800, making that a moot point.
The 6502 would quickly go on to be one of the most popular chips of its day. A number of companies licensed the 650x line from MOS, including Rockwell International, GTE, Synertek, and Western Design Center (WDC).
A number of different versions of the basic CPU, known as the 6503 through 6507, were offered in 28-pin packages for lower cost.
The 6504 was sometimes used in printers. MOS also released a series of similar CPUs using external clocks, which added a “1” to the name in the third digit, as the 6512 through 6515. These were useful in systems where the clock support was already being provided on the motherboard by some other source.
Data, context and interaction (DCI) is a paradigm used in computer software to program systems of communicating objects. Its goals are:
- To improve the readability of object-oriented code by giving system behavior first-class status;
- To cleanly separate code for rapidly changing system behavior (what the system does) from code for slowly changing domain knowledge (what the system is), instead of combining both in one class interface;
- To help software developers reason about system-level state and behavior instead of only object state and behavior;
- To support an object style of thinking that is close to peoples’ mental models, rather than the class style of thinking that overshadowed object thinking early in the history of object-oriented programming languages.
The paradigm separates the domain model (data) from use cases (context) and Roles thatobjects play (interaction). DCI is complementary to model–view–controller(MVC). MVC as apattern language is still used to separate the data and its processing from presentation.
Trygve Reenskaug introduced MVC into Smalltalk-76 while visiting Xerox Parc in the 1970s. In the 1980s, Jim Althoff and others implemented a version of MVC for the Smalltalk-80 class library. It was only later, in a 1988 article in The Journal of Object Technology, that MVC was expressed as a general concept.
- tokens that can be interpreted as some kind of value
- usually either as a quantitative measurement of, or a qualitative fact about some thing.
- Data are manipulated either as values or variables by encoding them into information.
- The word data is the traditional plural form of the now-archaic datum, neuterpast participle of the Latin dare, “to give”, hence “something given”.
- tabular (made up of rows and columns),
- tree (a set of nodes with parent–child relationship),
- graph (a set of connected nodes).
- Raw data, i.e., unprocessed data, refers to a collection of numbers, characters and is a relative term; data processing commonly occurs by stages, and the “processed data” from one stage may be considered the “raw data” of the next.
- Field data refers to raw data that is collected in an uncontrolled in situ environment.
- Experimental data refers to data that is generated within the context of a scientific investigation by observation and recording.
- Data are extracted from information
- Knowledge is derived from data
- Beynon-Davies uses the concept of a sign to distinguish between data and information;
- Data are symbols while information occurs when the symbols are used to refer to something.
- It is people and computers who collect data and impose patterns on it.
- These patterns are seen as information which can be used to enhance knowledge.
- These patterns can be interpreted as truth, and are authorized as aesthetic and ethical criteria.
- Events that leave behind perceivable physical or virtual remains can be traced back through data.
- Marks are no longer considered data once the link between the mark and observation is broken.
- This is nearly the inverse of the more common notion that information is processed to obtain data, which is then processed into knowledge.
- Mechanical computing devices are classified according to the means by which they represent data.
- An analog computer represents a datum as a voltage, distance, position, or other physical quantity.
- A digital computer represents a datum as a sequence of symbols drawn from a fixedalphabet.
- The most common digital computers use a binary alphabet, that is, an alphabet of two characters, typically denoted “0” and “1”. More familiar representations, such as numbers or letters, are then constructed from the binary alphabet.
- Some special forms of data are distinguished. A computer program is a collection of data, which can be interpreted as instructions.
- Most computer languages make a distinction between programs and the other data on which programs operate
- In some languages, notably Lisp and similar languages, programs are essentially indistinguishable from other data. It is also useful to distinguish metadata, that is, a description of other data.
- A similar yet earlier term for metadata is “ancillary data.” The prototypical example of metadata is the library catalog, which is a description of the contents of books.
- Supersets of this idea, where keys are derived, and values are arranged, relatively, are called data structures.
- They are also used in peripheral devices.
…on which operations are performed by a computer. The hardware implementation of almost all computers is imperative;[note 1] nearly all computer hardware is designed to execute machine code, which is native to the computer, written in the imperative style.
- Electrical signals
- Recorded magnetic, optical, or mechanical media
The output of a sequential circuit or computer program at any time is completely determined by its current inputs and current state.
Since each binary memory element has only two possible states, 0 or 1, the total number of different states a circuit can assume is finite, and fixed by the number of memory elements. If there are N binary memory elements, a digital circuit can have at most 2N distinct states.
A simple circuit diagram to show the labels of a n–p–n bipolar transistor.
A traditional flip-flop circuit based on bipolar junction transistors
A flip-flop or latch a bistable multivibrator data storage element for storage of state in a sequential logic circuit. It is a fundamental building block of digital electronics systems used in computers, communications, and many other types of systems. current state and previous inputs
It can also be used for counting of pulses, and for synchronizing variably-timed input signals to some reference timing signal
For example, the state of a microprocessor (computer chip) is the contents of all the memory elements in it:
“Hibernation” the state of the processor is stored on the computer’s disk.
A more specialized definition of state is used in some computer programs that operate serially(sequentially) on streams of data, such as
In some of these programs, information about previous data characters or packets received is stored in variables and used to affect the processing of the current character or packet.
This is called a “stateful protocol” and the data carried over from the previous processing cycle is called the “state”.
In others, the program has no information about the previous data stream and starts “fresh” with each data input; this is called a “stateless protocol“.
Following states are distinguished:
- Compatible states are states in a state machine that do not conflict for any input values. Thus for every input, both states must have the same output, and both states must have the same successor (or unspecified successors), or both must not change. Compatible states are redundant, if occurring in the same state machine.
- Distinguishable states are states in a state machine that have at least one input sequence causing different output sequences – no matter which state is the initial state.
- Equivalent states are states in a state machine which, for every possible input sequence, the same output sequence will be produced – no matter which state is the initial state.
Physical computer memory elements consist of
- An address
- A byte/word of data storage
Proper management of memory is vital for a computer system to operate properly. Modern (unlike early, single-task) operating systems have complex systems to properly manage memory.
Failure to do so can lead to
- slow performance
- takeover by viruses
- malicious software
Nearly all sequential logic today is clocked or synchronous logic. In a synchronous circuit, an electronic oscillator called a clock (or clock generator) generates a sequence of repetitive pulses called the clock signal which is distributed to all the memory elements in the circuit. The basic memory element in sequential logic is the flip-flop. The output of each flip-flop only changes when triggered by the clock pulse, so changes to the logic signals throughout the circuit all begin at the same time, at regular intervals, synchronized by the clock.
The output of all the storage elements (flip-flops) in the circuit at any given time, the binary data they contain, is called the state of the circuit. The state of a synchronous circuit only changes on clock pulses. At each cycle, the next state is determined by the current state and the value of the input signals when the clock pulse occurs.
The main advantage of synchronous logic is its simplicity. The logic gates which perform the operations on the data require a finite amount of time to respond to changes to their inputs. This is called propagation delay. The interval between clock pulses must be long enough so that all the logic gates have time to respond to the changes and their outputs “settle” to stable logic values, before the next clock pulse occurs. As long as this condition is met (ignoring certain other details) the circuit is guaranteed to be stable and reliable. This determines the maximum operating speed of a synchronous circuit.
Asynchronous sequential logic is not synchronized by a clock signal; the outputs of the circuit change directly in response to changes in inputs. The advantage of asynchronous logic is that it can be faster than synchronous logic, because the circuit doesn’t have to wait for a clock signal to process inputs. The speed of the device is potentially limited only by the propagation delays of the logic gates used.
“Asynchronous inputs”, inputs to the circuit from other systems which are not synchronized to the clock signal can cause the circuit to go into the wrong state, depending on small differences in the propagation delays of the logic gates. This is called a race condition. Asynchronous sequential circuits are typically used only in a few critical parts of otherwise synchronous systems where speed is at a premium, such as parts of microprocessors and digital signal processing circuits. The design of asynchronous logic uses different mathematical models and techniques from synchronous logic, and is an active area of research.
In integrated circuit design, dynamic logic (or sometimes clocked logic) is a design methodology in combinatory logic circuits, particularly those implemented in MOS technology. It is distinguished from the so-called static logic by exploiting temporary storage of information in stray and gate capacitances. It was popular in the 1970s and has seen a recent resurgence in the design of high speed digital electronics, particularly computer CPUs. Dynamic logic circuits are usually faster than static counterparts, and require less surface area, but are more difficult to design. Dynamic logic has a higher toggle rate than static logic but the capacitative loads being toggled are smaller so the overall power consumption of dynamic logic may be higher or lower depending on various tradeoffs. When referring to a particular logic family, the dynamic adjective usually suffices to distinguish the design methodology, e.g. dynamic CMOS or dynamic SOI design.
Dynamic logic is distinguished from so-called static logic in that dynamic logic uses a clock signalin its implementation of combinational logic circuits. The usual use of a clock signal is to synchronize transitions in sequential logic circuits. For most implementations of combinational logic, a clock signal is not even needed.
Arithmetic logic unit
An ALU must process numbers using the same formats as the rest of the digital circuit. The format of modern processors is almost always the two’s complement binary number representation.
Early computers used a wide variety of number systems, including ones’ complement, two’s complement, sign-magnitude format, and even true decimal systems, with various[NB 2]representation of the digits.
The ones’ complement and two’s complement number systems allow for subtraction to be accomplished by adding the negative of a number in a very simple way which negates the need for specialized circuits to do subtraction; however, calculating the negative in two’s complement requires adding a one to the low order bit and propagating the carry.
An alternative way to do two’s complement subtraction of A−B is to present a one to the carry input of the adder and use ¬B rather than B as the second input.
The arithmetic, logic and shift circuits introduced in previous sections can be combined into one ALU with common selection.
Most of a processor’s operations are performed by one or more ALUs. An ALU loads data from input registers. Then an external control unit tells the ALU what operation to perform on that data, and then the ALU stores its result into an output register. The control unit is responsible for moving the processed data between these registers, ALU and memory.
Engineers can design an arithmetic logic unit to calculate most operations. The more complex the operation, the more expensive the ALU is, the more space it uses in the processor, and the more power it dissipates. Therefore, engineers compromise. They make the ALU powerful enough to make the processor fast, yet not so complex as to become prohibitive. For example, computing the square root of a number might use:
- Calculation in a single clock Design an extraordinarily complex ALU that calculates the square root of any number in a single step.
- Calculation pipeline Design a very complex ALU that calculates the square root of any number in several steps. The intermediate results go through a series of circuits arranged like a factory production line. The ALU can accept new numbers to calculate even before having finished the previous ones. The ALU can now produce numbers as fast as a single-clock ALU, although the results start to flow out of the ALU only after an initial delay.
- Iterative calculation Design a complex ALU that calculates the square root through several steps. This usually relies on control from a complex control unit with built-inmicrocode.
- Co-processor Design a simple ALU in the processor, and sell a separate specialized and costly processor that the customer can install just beside this one, and implements one of the options above.
- Software libraries Tell the programmers that there is no co-processor and there is noemulation, so they will have to write their own algorithms to calculate square roots by software.
- Software emulation Emulate the existence of the co-processor, that is, whenever a program attempts to perform the square root calculation, make the processor check if there is a co-processor present and use it if there is one; if there is not one, interrupt the processing of the program and invoke the operating system to perform the square root calculation through some software algorithm.
The options above go from the fastest and most expensive one to the slowest and least expensive one. Therefore, while even the simplest computer can calculate the most complicated formula, the simplest computers will usually take a long time doing that because of the several steps for calculating the formula.
Inputs and outputs
The inputs to the ALU are the data to be operated on (called operands) and a code from the control unit indicating which operation to perform. Its output is the result of the computation. One thing designers must keep in mind is whether the ALU will operate on big-endian or little-endian numbers.
In many designs, the ALU also takes or generates inputs or outputs a set of condition codes from or to a status register. These codes are used to indicate cases such as carry-in or carry-out, overflow, divide-by-zero, etc.
A floating-point unit also performs arithmetic operations between two values, but they do so for numbers in floating-point representation, which is much more complicated than the two’s complement representation used in a typical ALU. In order to do these calculations, a FPU has several complex circuits built-in, including some internal ALUs.
In modern practice, engineers typically refer to the ALU as the circuit that performs integer arithmetic operations (like two’s complement and BCD). Circuits that calculate more complex formats like floating point, complex numbers, etc. usually receive a more specific name such as floating-point unit (FPU).
Many game consoles use interchangeable ROM cartridges, allowing for one system to play multiple games.
Computer memory that can retain the stored information even when not powered.
Before paper was used for storing data, it had been used in several applications for storing instructions to specify a machines operation. The earliest use of paper to store instructions for a machine was the work of Basile Bouchon who, in 1725, used punched paper rolls to control textile looms.
This technology was later developed into the wildly successful Jacquard loom.
Several inventors took the concept of a mechanical organ and used paper to represent the music.
In the late 1880s Herman Hollerith invented the recording of data on a medium that could then be read by a machine. Prior uses of machine readable media, above, had been for control (Automatons, Piano rolls, looms, …), not data. “After some initial trials with paper tape, he settled on punched cards…” Hollerith’s method was used in the 1890 census and the completed results were “… finished months ahead of schedule and far under budget”.Hollerith’s company eventually became the core of IBM.
Other technologies were also developed that allowed machines to work with marks on paper instead of punched holes:
- tabulating votes
- grading standardized tests.
- Barcodes made it possible for any object that was to be sold or transported to have some computer readable information securely attached to it
- Banks used magnetic ink on checks, supporting MICR scanning
In an early electronic computing device, the Atanasoff-Berry Computer, electric sparks were used to singe small holes in paper cards to represent binary data.
The altered dielectric constant of the paper at the location of the holes could then be used to read the binary data back into the machine by means of electric sparks of lower voltage than the sparks used to create the holes. This form of paper data storage was never made reliable and was not used in any subsequent machine.
- a grid of word lines (the address input) and bit lines (the data output)
- Combinational logic gates can be joined manually to map n-bit address input onto arbitrary values of m-bit data output (a look-up table)
- regular physical layout and predictable propagation delay
- In this less precise way, “ROM” can indicate a non-volatile memory which serves functions typically provided by mask ROM
- Can only be modified slowly or with difficulty
- By applying write protection, some types of reprogrammable ROMs may temporarily become read-only memory
To that end, ROM has been used in many computers to
- Store look-up tables
- Evaluation of mathematical and logical functions (for example, a floating-point unitmighttabulate the sine function in order to facilitate faster computation)
- display adapters of early personal computers stored tables of bitmapped font characters in ROM. This usually meant that the text display font could not be changed interactively. This was the case for both the CGA and MDA adapters available with the IBM PC XT.
- binary storage of cryptographic data, as it makes them difficult to replace, which may be desirable in order to enhance information security.
- Basic bootstrapping firmware for the main processor
- Various firmware needed to internally control self-contained devices such as graphic cards, hard disks, DVD drives, TFT screens, etc., in the system.
- Simple and mature sub-systems (such as the keyboard or some communication controllers in the integrated circuits on the main board, for example) may employ mask ROM or OTP (one-time programmable).
This image of the System/360 Model 91 was taken by NASA sometime in the late 1960s.
- Diode matrix ROM, used in small amounts in many computers in the 1960s as well as electronic desk calculators and keyboard encoders for terminals. This ROM was programmed by installing discrete semiconductor diodes at selected locations between a matrix of word line traces and bit line traces on a printed circuit board.
- Resistor, capacitor, or transformer matrix ROM, used in many computers until the 1970s. Like diode matrix ROM, it was programmed by placing components at selected locations between a matrix of word lines and bit lines. ENIAC‘s Function Tables were resistor matrix ROM, programmed by manually setting rotary switches.
- Various models of the IBM System/360 and complex peripheral devices stored theirmicrocode in either capacitor (called BCROS for balanced capacitor read-only storageon the 360/50 and 360/65, or CCROS forcard capacitor read-only Storage on the 360/30) or transformer (called TROS for transformer read-only storage on the 360/20, 360/40 and others) matrix ROM.
- Core rope, a form of transformer matrix ROM technology used where size and weight were critical. This was used in NASA/MIT‘s Apollo Spacecraft Computers, DEC‘s PDP-8computers, and other places. This type of ROM was programmed by hand by weaving “word line wires” inside or outside of ferrite transformer cores.
- Diamond Ring stores, in which wires are threaded through a sequence of large ferrite rings that function only as sensing devices. These were used in TXE telephone exchanges.
- The perforated metal character mask (“stencil“) in Charactron cathode ray tubes, which was used as ROM to shape a wide electron beam to form a selected character shape on the screen either for display or a scanned electron beam to form a selected character shape as an overlay on a video signal.
Other types of non-volatile solid-state memory permit some degree of modification:
- Programmable read-only memory (PROM), or one-time programmable ROM (OTP), can be written to or programmed via a special device called a PROM programmer. Typically, this device uses high voltages to permanently destroy or create internal links (fuses or antifuses) within the chip. Consequently, a PROM can only be programmed once.
- Erasable programmable read-only memory (EPROM) can be erased by exposure to strong ultraviolet light (typically for 10 minutes or longer), then rewritten with a process that again needs higher than usual voltage applied. Repeated exposure to UV light will eventually wear out an EPROM, but the endurance of most EPROM chips exceeds 1000 cycles of erasing and reprogramming. EPROM chip packages can often be identified by the prominent quartz “window” which allows UV light to enter. After programming, the window is typically covered with a label to prevent accidental erasure. Some EPROM chips are factory-erased before they are packaged, and include no window; these are effectively PROM.
- Electrically erasable programmable read-only memory (EEPROM) is based on a similar semiconductor structure to EPROM, but allows its entire contents (or selectedbanks) to be electrically erased, then rewritten electrically, so that they need not be removed from the computer (or camera, MP3 player, etc.). Writing or flashing an EEPROM is much slower (milliseconds per bit) than reading from a ROM or writing to a RAM (nanoseconds in both cases).
- Electrically alterable read-only memory (EAROM) is a type of EEPROM that can be modified one bit at a time. EAROMs are intended for applications that require infrequent and only partial rewriting.
- Flash memory (or simply flash) is a modern type of EEPROM invented in 1984. Flash memory can be erased and rewritten faster than ordinary EEPROM, and newer designs feature very high endurance (exceeding 1,000,000 cycles). ModernNAND flash makes efficient use of silicon chip area, resulting in individual ICs with a capacity as high as 32 GB as of 2007; this feature, along with its endurance and physical durability, has allowed NAND flash to replace magnetic in some applications (such as USB flash drives). Flash memory is sometimes called flash ROM or flash EEPROM when used as a replacement for older ROM types, but not in applications that take advantage of its ability to be modified quickly and frequently.
- Optical storage media, such CD-ROM which is read-only (analogous to masked ROM). CD-R is Write Once Read Many (analogous to PROM), while CD-RW supports erase-rewrite cycles (analogous to EEPROM); both are designed for backwards-compatibility with CD-ROM.
- Flash memory, invented at Toshiba in the mid-1980s, and commercialized in the early 1990s
- Decreasing cost of reprogrammable devices had almost eliminated the market for mask ROM by the year 2000.
- The most recent development is NAND flash, also invented at Toshiba. Its designers explicitly broke from past practice, stating plainly that “the aim of NAND Flash is to replace hard disks,” rather than the traditional use of ROM as a form of non-volatileprimary storage. As of 2007, NAND has partially achieved this goal by offering throughput comparable to hard disks, higher tolerance of physical shock, extreme miniaturization (in the form of USB flash drives and tiny microSD memory cards, for example), and much lower power consumption.
ROM to Flash
ROM could be implemented at a lower cost-per-bit than RAM for many years. Most home computers of the 1980s stored a BASIC interpreter or operating system in ROM as other forms of non-volatile storage such as magnetic disk drives were too costly. For example, theCommodore 64 included 64 KB of RAM and 20 KB of ROM contained a BASIC interpreter and the “KERNAL” of its operating system.
Later home or office computers such as the IBM PC XT often included magnetic disk drives, and larger amounts of RAM, allowing them to load their operating systems from disk into RAM, with only a minimal hardware initialization core and bootloader remaining in ROM (known as the BIOS in IBM-compatible computers). This arrangement allowed for a more complex and easily upgradeable operating system.
As of 2007 large RAM chips can be read faster than most ROMs. For this reason (and to allow uniform access), ROM content is sometimes copied to RAM or shadowed before its first use, and subsequently read from RAM.
As of 2008, most products use Flash rather than mask ROM, and many provide some means for connecting to a PC for firmware updates; for example, a digital audio player might be updated to support a new file format. Some hobbyists have taken advantage of this flexibility eg. iPodLinux and OpenWrt enabled users to run full-featured Linux distributions on their MP3 players and wireless routers, respectively.
ROM images, and can be used to produce duplicate cartridges, or in console emulators.
The cryptanalytic machine code-named “Aquarius” used at Bletchley Park during World War IIincorporated a hard-wired dynamic memory. Paper tape was read and the characters on it “were remembered in a dynamic store. The store used a large bank of capacitors, which were either charged or not, a charged capacitor representing cross (1) and an uncharged capacitor dot (0). Since the charge gradually leaked away, a periodic pulse was applied to top up those still charged (hence the term ‘dynamic’)”.
Through the construction of a glass tube filled with mercury and plugged at each end with a quartz crystal, delay lines could store bits of information within the quartz and transfer it through sound waves propagating through mercury.
Delay line memory would be limited to a capacity of up to a few hundred thousand bits to remain efficient.
Efforts began in the late 1940s to find non-volatile memory. magnetic core memory, which allowed for recall of memory after power loss, would become the dominant form of memory until the development of transistor-based memory in the late 1960s.
Schematic drawing of original designs of DRAM patented in 1968.
Static or dynamic RAM as main memory, the latter often being implicitly accessed via one or more cache levels. Most modern semiconductor volatile memory is either Static RAM (seeSRAM) or dynamic RAM (see DRAM).
The metal–oxide–semiconductor field-effect transistor (MOSFET, MOS-FET, or MOS FET) is a transistor used for amplifying or switching electronic signals. Although the MOSFET is a four-terminal device with source (S), gate (G), drain (D), and body (B) terminals, the body (or substrate) of the MOSFET is often connected to the source terminal, making it a three-terminal device like other field-effect transistors. Because these two terminals are normally connected to each other (short-circuited) internally, only three terminals appear in electrical diagrams. The MOSFET is by far the most common transistor in both digital and analog circuits, though the bipolar junction transistor was at one time much more common.
In enhancement mode MOSFETs, a voltage drop across the oxide induces a conducting channel between the source and drain contacts via the field effect. The term “enhancement mode” refers to the increase of conductivity with increase in oxide field that adds carriers to the channel, also referred to as the inversion layer. The channel can contain electrons (called an nMOSFET or nMOS), or holes (called a pMOSFET or pMOS), opposite in type to the substrate, so nMOS is made with a p-type substrate, and pMOS with an n-type substrate (see article on semiconductor devices). In the less common depletion mode MOSFET, detailed later on, the channel consists of carriers in a surface impurity layer of opposite type to the substrate, and conductivity is decreased by application of a field that depletes carriers from this surface layer.
MOSFET showing gate (G), body (B), source (S) and drain (D) terminals. The gate is separated from the body by an insulating layer (white)
A cross section through an nMOSFET when the gate voltage VGSis below the threshold for making a conductive channel; there is little or no conduction between the terminals drain and source; the switch is off. When the gate is more positive, it attracts electrons, inducing an n-type conductive channel in the substrate below the oxide, which allows electrons to flow between the n-doped terminals; the switch is on.
Simulation result for formation of inversion channel (electron density) and attainment of threshold voltage (IV) in a nanowire MOSFET. Note that the threshold voltage for this device lies around 0.45 V.
The basic principle of this kind of transistor was first patented by Julius Edgar Lilienfeld in 1925. Twenty five years later, when Bell Telephone attempted to patent the junction transistor, they found Lilienfeld already holding a patent which was worded in a way that would include all types of transistors. Bell Labs was able to work out an agreement with Lilienfeld, who was still alive at that time (it is not known if they paid him money or not). It was at that time the Bell Labs version was given the name bipolar junction transistor, or simply junction transistor, and Lilienfeld’s design took the name field effect transistor.
In 1959, Dawon Kahng and Martin M. (John) Atalla at Bell Labs invented the metal–oxide–semiconductor field-effect transistor (MOSFET) as an offshoot to the patented FET design. Operationally and structurally different from the bipolar junction transistor, the MOSFET was made by putting an insulating layer on the surface of the semiconductor and then placing a metallic gate electrode on that. It used crystalline silicon for the semiconductor and a thermally oxidized layer of silicon dioxide for the insulator. The silicon MOSFET did not generate localized electron traps at the interface between the silicon and its native oxide layer, and thus was inherently free from the trapping and scattering of carriers that had impeded the performance of earlier field-effect transistors. Following the development of clean rooms to reduce contamination to levels never before thought necessary, and of photolithography and the planar process to allow circuits to be made in very few steps, the Si–SiO2 system possessed such technical attractions as low cost of production (on a per circuit basis) and ease of integration. Largely because of these two factors, the MOSFET has become the most widely used type of transistor in integrated circuits.
Additionally, the method of coupling two complementary MOSFETS (P-channel and N-channel) into one high/low switch, known as CMOS, means that digital circuits dissipate very little power except when actually switched.
The earliest microprocessors starting in 1970 were all “MOS microprocessors” — i.e., fabricated entirely from PMOS logic or fabricated entirely from NMOS logic. In the 1970s, “MOS microprocessors” were often contrasted with “CMOS microprocessors” and “bipolar bit-slice processors”.
Dynamic random-access memory (DRAM)
One transistor and a capacitor are required per bit, compared to four or six transistors in SRAM. This allows DRAM to reach very high densities. The transistors and capacitors used are extremely small; billions can fit on a single memory chip; much cheaper per bit. More complicated to interface to and control because even “nonconducting” transistors always leak a small amount, the capacitors will slowly discharge, information eventually fades unless the capacitor charge is refreshed periodically. Because of this refresh requirement, it is a dynamic memory as opposed to SRAM and other static memory.
Refresh logic is provided in a DRAM controller which automates the periodic refresh, that is no software or other hardware has to perform it. Require some sort of counter to keep track of which row is the next to be refreshed. Most DRAM chips include that counter. Older types require external refresh logic to hold the counter.
Under some conditions, most of the data in DRAM can be recovered even if the DRAM has not been refreshed for several minutes.
This problem can be mitigated by using redundant memory bits and memory controllers that exploit these bits, usually implemented within DRAM modules. These extra bits are used to record parity and to enable missing data to be reconstructed by error-correcting code (ECC). Parity allows the detection of all single-bit errors (actually, any odd number of wrong bits). The most common error-correcting code, a SECDED Hamming code, allows a single-bit error to be corrected and, in the usual configuration, with an extra parity bit, double-bit errors to be detected.
An ECC-capable memory controller as used in many modern PCs can typically detect and correct errors of a single bit per 64-bit “word” (the unit of bus transfer), and detect (but not correct) errors of two bits per 64-bit word. Some systems also ‘scrub‘ the errors, by writing the corrected version back to memory.
DRAM is usually arranged in a rectangular array of charge storage cells consisting of one capacitor and transistor per data bit. The figure to the right shows a simple example with a 4 by 4 cell matrix. Modern DRAM matrices are many thousands of cells in height and width.
The long horizontal lines connecting each row are known as word-lines. Each column of cells is composed of two bit-lines, each connected to every other storage cell in the column (the illustration to the right does not include this important detail). They are generally known as the + and − bit-lines.
Operations to read a data bit from a DRAM storage cell
- The sense amplifiers are disconnected.
- The bit-lines are precharged to exactly equal voltages that are in between high and low logic levels (e.g., 0.5 V if the two levels are 0 and 1 V). The bit-lines are physically symmetrical to keep the capacitance equal, and therefore at this time their voltages are equal.
- The precharge circuit is switched off. Because the bit-lines are relatively long, they have enough capacitance to maintain the precharged voltage for a brief time. This is an example of dynamic logic.
- The desired row’s word-line is then driven high to connect a cell’s storage capacitor to its bit-line. This causes the transistor to conduct, transferring charge from the storage cell to the connected bit-line (if the stored value is 1) or from the connected bit-line to the storage cell (if the stored value is 0). Since the capacitance of the bit-line is typically much higher than the capacitance of the storage cell, the voltage on the bit-line increases very slightly if the storage cell’s capacitor is discharged and decreases very slightly if the storage cell is charged (e.g., 0.54 V and 0.45 V in the two cases). As the other bit-line holds 0.50 V there is a small voltage difference between the two twisted bit-lines.
- The sense amplifiers are now connected to the bit-lines pairs. Positive feedback then occurs from the cross-connected inverters, thereby amplifying the small voltage difference between the odd and even row bit-lines of a particular column until one bit line is fully at the lowest voltage and the other is at the maximum high voltage. Once this has happened, the row is “open” (the desired cell data is available).
- All storage cells in the open row are sensed simultaneously, and the sense amplifier outputs latched. A column address then selects which latch bit to connect to the external data bus. Reads of different columns in the same row can be performed without a row opening delay because, for the open row, all data has already been sensed and latched.
- While reading of columns in an open row is occurring, current is flowing back up the bit-lines from the output of the sense amplifiers and recharging the storage cells. This reinforces (i.e. “refreshes”) the charge in the storage cell by increasing the voltage in the storage capacitor if it was charged to begin with, or by keeping it discharged if it was empty. Note that due to the length of the bit-lines there is a fairly long propagation delay for the charge to be transferred back to the cell’s capacitor. This takes significant time past the end of sense amplification, and thus overlaps with one or more column reads.
- When done with reading all the columns in the current open row, the word-line is switched off to disconnect the storage cell capacitors (the row is “closed”) from the bit-lines. The sense amplifier is switched off, and the bit lines are precharged again.
To write to memory
To store data, a row is opened and a given column’s sense amplifier is temporarily forced to the desired high or low voltage state, thus causing the bit-line to charge or discharge the cell storage capacitor to the desired value.
Due to the sense amplifier’s positive feedback configuration, it will hold a bit-line at stable voltage even after the forcing voltage is removed.
During a write to a particular cell, all the columns in a row are sensed simultaneously just as during reading, so although only a single column’s storage-cell capacitor charge is changed, the entire row is refreshed (written back in), as illustrated in the figure to the right.
An asynchronous DRAM chip has:
- Power connections
- Some number of address inputs (typically 12)
- A few (typically one or four) bidirectional data lines
1964 Arnold Farber and Eugene Schlig, working for IBM, created a hard-wired memory cell, using a transistor gate and tunnel diode latch. Two transistors and two resistors, a configuration which became known as the Farber-Schlig cell.
1965 Benjamin Agusta and his team at IBM created a 16-bit silicon memory chip based on the Farber-Schlig cell, with 80 transistors, 64 resistors, and four diodes.
1966 DRAM was invented by Dr. Robert Dennard at the IBM Thomas J. Watson Research Center. He was granted U.S. patent number 3,387,286 in 1968. Capacitors had been used for earlier memory schemes such as the drum of the Atanasoff–Berry Computer, the Williams tube and the Selectron tube.
October 1970 However, the 1102 had many problems, prompting Intel to begin work on their own improved design, in secrecy to avoid conflict with Honeywell. This became the first commercially available DRAM memory, the Intel 1103 (1024×1)
1973 The first DRAM with multiplexed row and column address lines was the Mostek MK4096 (4096×1) designed by Robert Proebsting. This addressing scheme uses the same address pins to receive the low half and the high half of the address of the memory cell being referenced, switching between the two halves on alternating bus cycles.
This was a radical advance, effectively halving the number of address lines required, which enabled it to fit into packages with fewer pins, a cost advantage that grew with every jump in memory size. The MK4096 proved to be a very robust design for customer applications.
At the 16K density, the cost advantage increased; the Mostek MK4116 16K DRAM, introduced in 1976, achieved greater than 75% worldwide DRAM market share.
early 80s however, as density increased to 64K in the Mostek was overtaken by Japanese DRAM manufacturers selling higher quality DRAMs using the same multiplexing scheme at below-cost prices.
Static random-access memory (SRAM or static RAM)
- Uses bistable latching circuitry to store each bit. The term static differentiates it fromdynamic RAM (DRAM) which must be periodically refreshed.
- SRAM exhibits data remanence, but it is still volatile in the conventional sense that data is eventually lost when the memory is not powered
- Retains its contents as long as the power is connected and is easy to interface to but uses six transistors per bit.
- Not worthwhile for desktop system memory, where DRAM dominates, but is used for their cache memories.
- SRAM is commonplace in small embedded systems, which might only need tens of kilobytes or less.
A static RAM chip from a NES clone (2K x 8 bit)
- SRAM is more expensive and less dense than DRAM and is therefore not used for high-capacity, low-cost applications such as the main memory in personal computers.
Static RAM exists primarily as:
- general purpose products
- with asynchronous interface, such as the ubiquitous 28-pin 8Kx8 and 32Kx8 chips (often but not always named something along the lines of 6264 and 62C256 respectively), as well as similar products up to 16 Mbit per chip
- with synchronous interface, usually used for caches and other applications requiring burst transfers, up to 18 Mbit (256Kx72) per chip
- integrated on chip
- as RAM or cache memory in micro-controllers (usually from around 32 bytes up to 128 kilobytes)
- as the primary caches in powerful microprocessors, such as the x86 family, and many others (from 8 KB, up to several megabytes)
- to store the registers and parts of the state-machines used in some microprocessors (see register file)
- on application specific ICs, or ASICs (usually in the order of kilobytes)
- in FPGAs and CPLDs
- Many categories of industrial and scientific subsystems, automotive electronics, and similar, contain static RAM.
- Some amount (kilobytes or less) is also embedded in practically all modern appliances, toys, etc. that implement an electronic user interface.
- Several megabytes may be used in complex products such as digital cameras, cell phones, synthesizers, etc.
SRAM is also used in personal computers, workstations, routers and peripheral equipment: CPU register files, internal CPU caches and external burst mode SRAM caches, hard diskbuffers, router buffers, etc. LCD screens and printers also normally employ static RAM to hold the image displayed (or to be printed).
SRAM usually requires only three controls: Chip Enable (CE), Write Enable (WE) and Output Enable (OE). In synchronous SRAM, Clock (CLK) is also included.
Asynchronous SRAM are available from 4 Kb to 64 Mb. The fast access time of SRAM makes asynchronous SRAM appropriate as main memory for small cache-less embedded processors used in everything from industrial electronics and measurement systems to hard disks and networking equipment, among many other applications. They are used in various applications like switches and routers, IP-Phones, IC-Testers, DSLAM Cards, to Automotive Electronics.
- Bipolar junction transistor (used in TTL and ECL) – very fast but consumes a lot of power
- MOSFET (used in CMOS) – low power and very common today
- Asynchronous – independent of clock frequency; data in and data out are controlled by address transition
- Synchronous – all timings are initiated by the clock edge(s). Address, data in and other control signals are associated with the clock signals
- ZBT (ZBT stands for zero bus turnaround) – the turnaround is the number of clock cycles it takes to change access to the SRAM from write to read and vice versa. The turnaround for ZBT SRAMs or the latency between read and write cycle is zero.
- syncBurst (syncBurst SRAM or synchronous-burst SRAM) – features synchronous burst write access to the SRAM to increase write operation to the SRAM
- DDR SRAM – Synchronous, single read/write port, double data rate I/O
- Quad Data Rate SRAM – Synchronous, separate read and write ports, quadruple data rate I/O
- Binary SRAM
- Ternary SRAM
Two additional access transistors serve to control the access to a storage cell during read and write operations. In addition to such six-transistor (6T) SRAM, other kinds of SRAM chips use 4, 8, 10 (4T, 8T, 10T SRAM), or more transistors per bit.
Four-transistor SRAM is quite common in stand-alone SRAM devices (as opposed to SRAM used for CPU caches), implemented in special processes with an extra layer of polysilicon, allowing for very high-resistance pull-up resistors. [7
Four transistor SRAM provides advantages in density at the cost of manufacturing complexity. The resistors must have small dimensions and large values.
Access to the cell is enabled by the word line (WL in figure) which controls the two accesstransistors M5 and M6 which, in turn, control whether the cell should be connected to the bit lines: BL and BL. They are used to transfer data for both read and write operations. Although it is not strictly necessary to have two bit lines, both the signal and its inverse are typically provided in order to improve noise margins.
During read accesses, the bit lines are actively driven high and low by the inverters in the SRAM cell. This improves SRAM bandwidth compared to DRAMs – in a DRAM, the bit line is connected to storage capacitors and charge sharing causes the bitline to swing upwards or downwards. The symmetric structure of SRAMs also allows for differential signaling, which makes small voltage swings more easily detectable. Another difference with DRAM that contributes to making SRAM faster is that commercial chips accept all address bits at a time. By comparison, commodity DRAMs have the address multiplexed in two halves, i.e. higher bits followed by lower bits, over the same package pins in order to keep their size and cost down.
The size of an SRAM with m address lines and n data lines is 2m words, or 2m × n bits.
The most common word size is 8 bits, meaning that a single byte can be read or written to each of 2m different words within the SRAM chip.
Several common SRAM chips have 11 address lines (thus a capacity of 2m = 2,048 = 2k words) and an 8-bit word, so they are referred to as “2k × 8 SRAM”.
- If the word line is not asserted, the access transistors M5 and M6 disconnect the cell from the bit lines. The two cross-coupled inverters formed by M1 – M4 will continue to reinforce each other as long as they are connected to the supply.
- Assume that the content of the memory is a 1, stored at Q. The read cycle is started by precharging both the bit lines to a logical 1, then asserting the word line WL, enabling both the access transistors. The second step occurs when the values stored in Q and Q are transferred to the bit lines by leaving BL at its precharged value and discharging BL through M1 and M5 to a logical 0 (i. e. eventually discharging through the transistor M1 as it is turned on because the Q is logically set to 1). On the BL side, the transistors M4 and M6 pull the bit line toward VDD, a logical 1 (i. e. eventually being charged by the transistor M4 as it is turned on because Q is logically set to 0). If the content of the memory was a 0, the opposite would happen and BL would be pulled toward 1 and BL toward 0. Then the BL and BL lines will have a small voltage difference between them while reaching a sense amplifier, which will sense which line has the higher voltage thus determining whether there was 1 stored or 0. The higher the sensitivity of the sense amplifier, the faster the speed of the read operation.
- The start of a write cycle begins by applying the value to be written to the bit lines. If we wish to write a 0, we would apply a 0 to the bit lines, i.e. setting BL to 1 and BL to 0. This is similar to applying a reset pulse to an SR-latch, which causes the flip flop to change state. A 1 is written by inverting the values of the bit lines. WL is then asserted and the value that is to be stored is latched in. Note that the reason this works is that the bit line input-drivers are designed to be much stronger than the relatively weak transistors in the cell itself, so that they can easily override the previous state of the cross-coupled inverters. Careful sizing of the transistors in an SRAM cell is needed to ensure proper operation.
RAM with an access time of 70 ns will output valid data within 70 ns from the time that the address lines are valid. But the data will remain for a hold time as well (5–10 ns). Rise and fall times also influence valid timeslots with approximately 5 ns. By reading the lower part of an address range bits in sequence (page cycle) one can read with significantly shorter access time (30 ns).
Non-volatile SRAM (nvSRAM)
- While SRAM can read and write, nvSRAM can read, write, store and recall.
- The additional operations center around the non-volatile part of nvSRAM.
- When reading and writing, an nvSRAM acts no differently than a standard async SRAM.
- The attached processor or controller sees an 8-bit SRAM interface and nothing else.
- The STORE operation stores data that is in a SRAM array in the non-volatile part.
Cypress and Simtek nvSRAM have three ways to store data in the non-volatile area. They are:
- Hardware store
- Software store
- Autostore happens automatically when the data main voltage source drops below the device’s operating voltage.; power control is switched from Vcc to the capacitor. The capacitor will power the chip long enough to store the SRAM contents into the non-volatile part.
- The HSB (Hardware Store Busy) pin externally initiates a non-volatile hardware store operation. Using the HSB signal, which requests a non-volatile hardware STORE cycle, is optional.
- Software store is initiated by a certain sequence of operations. When the defined operations are done in sequence the software store is initiated.
It was invented by F. Dill, D. Ling and R. Matick at IBM Research in 1980, with a patent issued in 1985 (US Patent 4,541,075).
The first commercial use of VRAM was in a high-resolution graphics adapter introduced in 1986 by IBM for the PC/RT system, which set a new standard for graphics displays.
Prior to the development of VRAM, dual-ported memory was quite expensive, limiting higher resolution bitmapped graphics to high-end workstations. VRAM improved the overall framebuffer throughput, allowing low cost, high-resolution, high-speed, color graphics.
Modern GUI-based operating systems benefitted from this and thus it provided a key ingredient for proliferation of graphic user interfaces throughout the world at that time.
VRAM has two sets of data output pins, and thus two ports that can be used simultaneously.
- The first port, the DRAM port, is accessed by the host computer in a manner very similar to traditional DRAM.
- The second port, the video port, is typically read-only and is dedicated to providing a high throughput, serialized data channel for the graphics chipset.
Typical DRAM arrays normally access a full row of bits (i.e. a word line) at up to 1,024 bits at one time, but only use one or a few of these for actual data, the remainder being discarded.
Since DRAM cells are destructively read, each row accessed must be sensed, and re-written. Thus, 1,024 sense amplifiers are typically used.
VRAM operates by not discarding the excess bits which must be accessed, but making full use of them in a simple way.
If each horizontal scan line of a display is mapped to a full word, then upon reading one word and latching all 1,024 bits into a separate row buffer, these bits can subsequently be serially streamed to the display circuitry. This will leave access to the DRAM array free to be accessed (read or write) for many cycles, until the row buffer is almost depleted. A complete DRAM read cycle is only required to fill the row buffer, leaving most DRAM cycles available for normal accesses.
Such operation is described in the paper “All points addressable raster display memory” by R. Matick, D. Ling, S. Gupta, and F. Dill, IBM Journal of R&D, Vol 28, No. 4, July 1984, pp. 379–393.
To use the video port, the controller first uses the DRAM port to select the row of the memory array that is to be displayed. The VRAM then copies that entire row to an internal row-buffer which is a shift register. The controller can then continue to use the DRAM port for drawing objects on the display. Meanwhile, the controller feeds a clock called the shift clock (SCLK) to the VRAM’s video port. Each SCLK pulse causes the VRAM to deliver the next data bit, in strict address order, from the shift register to the video port. For simplicity, the graphics adapter is usually designed so that the contents of a row, and therefore the contents of the shift-register, corresponds to a complete horizontal line on the display.
Through the 1990s, many graphic subsystems used VRAM, with the number of megabits touted as a selling point.
In the late 1990s, synchronous DRAM technologies gradually became affordable, dense, and fast enough to displace VRAM, even though it is only single-ported and more overhead is required.
Window DRAM (WRAM)
WRAM is a variant of VRAM that was once used in graphics adaptors such as the Matrox Millenium and ATI 3D Rage Pro. WRAM was designed to perform better and cost less than VRAM.
WRAM offered up to 25% greater bandwidth than VRAM and accelerated commonly used graphical operations such as text drawing and block fills.
Fast page mode DRAM (FPM DRAM)
Fast page mode DRAM is also called FPM DRAM, FPRAM, Page mode DRAM, Fast page mode memory, or Page mode memory.
In page mode, a row of the DRAM can be kept “open” by holding /RAS low while performing multiple reads or writes with separate pulses of /CAS so that successive reads or writes within the row do not suffer the delay of precharge and accessing the row.
This increases the performance of the system when reading or writing bursts of data.
Static column is a variant of page mode in which the column address does not need to be stored in, but rather, the address inputs may be changed with /CAS held low, and the data output will be updated accordingly a few nanoseconds later.
Nibble mode is another variant in which four sequential locations within the row can be accessed with four consecutive pulses of /CAS. The difference from normal page mode is that the address inputs are not used for the second through fourth /CAS edges; they are generated internally starting with the address supplied for the first /CAS edge.
Extended data out DRAM (EDO DRAM)
EDO DRAM, sometimes referred to as Hyper Page Mode enabled DRAM, is similar to Fast Page Mode DRAM. New access cycle can be started while keeping the data output of the previous cycle active.
This allows a certain amount of overlap in operation (pipelining), allowing somewhat improved performance.
To be precise, EDO DRAM begins data output on the falling edge of /CAS, but does not stop the output when /CAS rises again. It holds the output valid (thus extending the data output time) until either /RAS is deasserted, or a new /CAS falling edge selects a different column address.
Single-cycle EDO has the ability to carry out a complete memory transaction in one clock cycle.
Otherwise, each sequential RAM access within the same page takes two clock cycles instead of three, once the page has been selected. EDO’s performance and capabilities allowed it to somewhat replace the then-slow L2 caches of PCs. It created an opportunity to reduce the immense performance loss associated with a lack of L2 cache, while making systems cheaper to build. This was also good for notebooks due to difficulties with their limited form factor, and battery life limitations. An EDO system with L2 cache was tangibly faster than the older FPM/L2 combination.
Single-cycle EDO DRAM became very popular on video cards towards the end of the 1990s. It was very low cost, yet nearly as efficient for performance as the far more costly VRAM.
Much equipment taking 72-pin SIMMs could use either FPM or EDO. Problems were possible, particularly when mixing FPM and EDO. Early Hewlett-Packard printers had FPM RAM built in; some, but not all, models worked if additional EDO SIMMs were added.
Burst EDO DRAM (BEDO DRAM)
An evolution of EDO DRAM, Burst EDO DRAM, could process four memory addresses in one burst, for a maximum of 5‐1‐1‐1, saving an additional three clocks over optimally designed EDO memory.
It was done by adding an address counter on the chip to keep track of the next address. BEDO also added a pipelined stage allowing page-access cycle to be divided into two components.
During a memory-read operation, the first component accessed the data from the memory array to the output stage (second latch). The second component drove the data bus from this latch at the appropriate logic level. Since the data is already in the output buffer, quicker access time is achieved (up to 50% for large blocks of data) than with traditional EDO.
Although BEDO DRAM showed additional optimization over EDO, by the time it was available the market had made a significant investment towards synchronous DRAM, or SDRAM . Even though BEDO RAM was superior to SDRAM in some ways, the latter technology quickly displaced BEDO.
Multibank DRAM (MDRAM)
Multibank DRAM applies the interleaving technique for main memory to second-level cache memory to provide a cheaper and faster alternative to SRAM. The chip splits its memory capacity into small blocks of 256 kB and allows operations to two different banks in a single clock cycle.
Boards based upon this chipset often used the unusual RAM size configuration of 2.25 MB, owing to MDRAM’s ability to be implemented in various sizes more easily. This size of 2.25 MB allowed 24-bit color at a resolution of 1024×768, a very popular display setting in the card’s time.
Synchronous graphics RAM (SGRAM)
SGRAM is a specialized form of SDRAM for graphics adaptors. It adds functions such as bit masking (writing to a specified bit plane without affecting the others) and block write (filling a block of memory with a single colour). Unlike VRAM and WRAM, SGRAM is single-ported. However, it can open two memory pages at once, which simulates the dual-port nature of other video RAM technologies.
GDDR, or Graphics Double Data Rate Memory, refers to memory specifically designed for use on graphics cards. GDDR is distinct from the more widely known DDR SDRAM types such as DDR3, although they share some technologies – including double data rate design – in common. Currently, the following generations of GDDR exist, with the higher number indicating the more recent specifications: GDDR2 GDDR3 GDDR4 GDDR5.
GDDR5, or Graphics Double Data Rate version 5, SGRAM is a type of memory designed for use in graphics cards and other computer applications requiring high bandwidth. Like its predecessor, GDDR4, GDDR5 is based on DDR3 SDRAM memory which has double the data lines compared to DDR2 SDRAM, but GDDR5 also has 8-bit wide prefetch buffers similar to GDDR4. SGRAM is single-ported. However, it can open two memory pages at once, which simulates the dual-port nature of other VRAM technologies. It uses an 8n-prefetch architecture and DDR interface to achieve high performance operation and can be configured to operate in ×32 mode or ×16 (clamshell) mode which is detected during device initialization. The GDDR5 SGRAM uses a total of three clocks: two write clocks associated with two bytes (WCK01 and WCK23) and a single command clock (CK). Taking a GDDR5 with 5 Gbit/s data rate per pin as an example, the CK clock runs with 1.25 GHz and both WCK clocks at 2.5 GHz. The CK and WCKs are phase aligned during the initialization and training sequence. This alignment allows read and write access with minimum latency. A single 32-bit GDDR5 chip has about 67 signal pins and the rest are power and grounds in the 170 BGA package.
Synchronous dynamic random-access memory (SDRAM)
Dynamic random access memory (DRAM) that is synchronized with the system bus. Classic DRAM has an asynchronous interface, which means that it responds as quickly as possible to changes in control inputs. SDRAM has a synchronous interface, meaning that it waits for a clock signal before responding to control inputs and is therefore synchronized with the computer’s system bus. The clock is used to drive an internal finite state machine that pipelines incoming commands.
The data storage area is divided into several banks, allowing the chip to work on several memory access commands at a time, interleaved among the separate banks.
his allows higher data access rates than an asynchronous DRAM.
Pipelining means that the chip can accept a new command before it has finished processing the previous one. In a pipelined write, the write command can be immediately followed by another command, without waiting for the data to be written to the memory array.
In a pipelined read, the requested data appears after a fixed number of clock cycles after the read command (latency), clock cycles during which additional commands can be sent.
(This delay is called the latency and is an important performance parameter to consider when purchasing SDRAM for a computer.)
SDRAM is widely used in computers; from the original SDRAM, further generations of DDR (orDDR1) and then DDR2 and DDR3 have entered the mass market, with DDR4 currently being designed and anticipated to be available in 2014.
Although, the concept of synchronous DRAM has been known since at least the 1970s and was used with early Intel processors, it was only in 1993 that SDRAM began its path to universal acceptance in the electronics industry.
In 1993, Samsung introduced its KM48SL2000 synchronous DRAM, and by 2000, SDRAM had replaced virtually all other types of DRAM in modern computers, because of its greater performance.
SDRAM latency is not inherently lower (faster) than asynchronous DRAM.
Indeed, early SDRAM was somewhat slower than contemporaneous burst EDO DRAM due to the additional logic. The benefits of SDRAM’s internal buffering come from its ability to interleave operations to multiple banks of memory, thereby increasing effective bandwidth.
SDR SDRAM (Single Data Rate synchronous DRAM)
Originally simply known as SDRAM, single data rate SDRAM can accept one command and transfer one word of data per clock cycle. Typical clock frequencies are 100 and 133 MHz. Chips are made with a variety of data bus sizes (most commonly 4, 8 or 16 bits), but chips are generally assembled into 168-pin DIMMs that read or write 64 (non-ECC) or 72 (ECC) bits at a time.
Use of the data bus is intricate and thus requires a complex DRAM controller circuit. This is because data written to the DRAM must be presented in the same cycle as the write command, but reads produce output 2 or 3 cycles after the read command. The DRAM controller must ensure that the data bus is never required for a read and a write at the same time.
Typical SDR SDRAM clock rates are 66, 100, and 133 MHz (periods of 15, 10, and 7.5 ns). Clock rates up to 150 MHz were available for performance enthusiasts.
This type of SDRAM is slower than the DDR variants, because only one word of data is transmitted per clock cycle (single data rate). But this type is also faster than its predecessorsEDO-RAM and FPM-RAM which took typically 2 or 3 clocks to transfer one word of data.
Rambus DRAM (RDRAM)
RDRAM was a proprietary technology that competed against DDR. Its relatively high price and disappointing performance (resulting from high latencies and a narrow 16-bit data channel versus DDR’s 64 bit channel) caused it to lose the race to succeed SDR DRAM.
Synchronous-Link DRAM (SLDRAM)
SLDRAM boasted higher performance and competed against RDRAM. It was developed during the late 1990s by the SLDRAM Consortium. The SLDRAM Consortium consisted of about 20 major DRAM and computer industry manufacturers. (The SLDRAM Consortium became incorporated as SLDRAM Inc. and then changed its name to Advanced Memory International, Inc.). SLDRAM was an open standard and did not require licensing fees. The specifications called for a 64-bit bus running at a 200, 300 or 400 MHz clock frequency. This is achieved by all signals being on the same line and thereby avoiding the synchronization time of multiple lines. Like DDR SDRAM, SLDRAM uses a double-pumped bus, giving it an effective speed of 400,600, or 800 MT/s.
SLDRAM used an 11-bit command bus (10 command bits CA9:0 plus one start-of-command FLAG line) to transmit 40-bit command packets on 4 consecutive edges of a differential command clock (CCLK/CCLK#). Unlike SDRAM, there were no per-chip select signals; each chip was assigned an ID when reset, and the command contained the ID of the chip that should process it. Data was transferred in 4- or 8-word bursts across an 18-bit (per chip) data bus, using one of two differential data clocks (DCLK0/DCLK0# and DCLK1/DCLK1#). Unlike standard SDRAM, the clock was generated by the data source (the SLDRAM chip in the case of a read operation) and transmitted in the same direction as the data, greatly reducing data skew. To avoid the need for a pause when the source of the DCLK changes, each command specified which DCLK pair it would use.
ETA-RAM is a trademark for a novel RAM computer memory technology developed by Eta Semiconductor. ETA-RAM has the benefits of improving on both parameters (cost and dissipated power) combining the advantages of both DRAM and SRAM: lower cost of existing DRAMs, lower power dissipation and higher performance than SRAMs.
The cost advantages are obtained by utilizing a much simpler process technology and by reducing significantly the silicon area of the cells: an ETA-RAM cell requires about the same silicon area of modern DRAM devices. The improved power dissipation is obtained by reducing the current utilized in reading and writing the data bits in the cell and by removing the refresh requirements.
In order to combine the advantages of the two RAM types, Eta Semiconductor adopted a new approach based on building static memory cells using a single process structure of minimum dimensions that by itself cover the same function of a conventional SRAM. This is possible using a new CMOS Technology for the manufacturing of high-density integrated circuits invented by the founders of Eta Semiconductor. Such technology, said ETA CMOS, defines novel structures that, thanks to metal junctions and the use of stacked gates, develop simultaneously the functions of more traditional transistors.
A random-access memory similar in construction to DRAM but uses a ferroelectric layer instead of a dielectric layer to achieve non-volatility. FeRAM is one of a growing number of alternative non-volatile random-access memory technologies that offer the same functionality as flash memory. FeRAM advantages over flash include: lower power usage, faster write performance and a much greater maximum number of write-erase cycles (exceeding 1016 for 3.3 V devices). Disadvantages of FeRAM are much lower storage densities than flash devices, storage capacity limitations, and higher cost.
Ferroelectric RAM was proposed by MIT graduate student Dudley Allen Buck in his master’s thesis, Ferroelectrics for Digital Information Storage and Switching, published in 1952.Development of FeRAM began in the late 1980s. Work was done in 1991 at NASA’s Jet Propulsion Laboratory on improving methods of read out, including a novel method of non-destructive readout using pulses of UV radiation. Much of the current FeRAM technology was developed by Ramtron, a fabless semiconductor company.
Since 1999 they have been using this line to produce standalone FeRAMs, as well as specialized chips (e.g. chips for smart cards) with embedded FeRAMs within. Fujitsu produced devices for Ramtron until 2010.
Since 2010 Ramtron’s fabricators have been TI (Texas Instruments) and IBM. Since at least 2001Texas Instruments has collaborated with Ramtron to develop FeRAM test chips in a modified 130 nm process.
In the fall of 2005, Ramtron reported that they were evaluating prototype samples of an 8-megabit FeRAM manufactured using Texas Instruments’ FeRAM process.
Fujitsu and Seiko-Epson were in 2005 collaborating in the development of a 180 nm FeRAM process. In 2012 Ramtron was acquired by Cypress Semiconductor. FeRAM research projects have also been reported at Samsung,Matsushita,Oki, Toshiba, Infineon, Hynix, Symetrix,Cambridge University, University of Toronto, and theInteruniversity Microelectronics Centre(IMEC, Belgium).
Each storage element, a cell, consists of one capacitor and one transistor, a so-called “1T-1C” device.
DRAM cells scale directly with the size of the semiconductor fabrication process being used to make it. For instance, on the 90 nm process used by most memory providers to make DDR2 DRAM, the cell size is 0.22 μm², which includes the capacitor, transistor, wiring, and some amount of “blank space” between the various parts — it appears 35% utilization is typical, leaving 65% of the space wasted.
Data in a DRAM is stored as the presence or lack of an electrical charge in the capacitor, with the lack of charge in general representing “0”.
Writing is accomplished by activating the associated control transistor, draining the cell to write a “0”, or sending current into it from a supply line if the new value should be “1”.
Reading is similar in nature; the transistor is again activated, draining the charge to a sense amplifier. If a pulse of charge is noticed in the amplifier, the cell held a charge and thus reads “1”; the lack of such a pulse indicates a “0”.
Note that this process is destructive, once the cell has been read. If it did hold a “1,” it must be re-charged to that value again. Since a cell loses its charge after some time due to leak currents, it must be actively refreshed at intervals.
The 1T-1C storage cell design in an FeRAM is similar in construction to the storage cell in widely used DRAM in that both cell types include one capacitor and one access transistor.
A ferroelectric material has a nonlinear relationship between the applied electric field and the apparent stored charge.
The dielectric constant of a ferroelectric is typically much higher than that of a linear dielectric because of the effects of semi-permanent electric dipoles formed in the crystal structure of the ferroelectric material.
Binary “0”s and “1”s are stored as one of two possible electric polarizations in each data storage cell. For example, in the figure a “1” is encoded using the negative remnant polarization “-Pr”, and a “0” is encoded using the positive remnant polarization “+Pr”. In terms of operation, FeRAM is similar to DRAM. Writing is accomplished by applying a field across the ferroelectric layer by charging the plates on either side of it, forcing the atoms inside into the “up” or “down” orientation (depending on the polarity of the charge), thereby storing a “1” or “0”.
Reading, however, is somewhat different than in DRAM. The transistor forces the cell into a particular state, say “0”. If the cell already held a “0”, nothing will happen in the output lines. If the cell held a “1”, the re-orientation of the atoms in the film will cause a brief pulse of current in the output as they push electrons out of the metal on the “down” side. The presence of this pulse means the cell held a “1”. Since this process overwrites the cell, reading FeRAM is a destructive process, and requires the cell to be re-written if it was changed.
In general, the operation of FeRAM is similar to ferrite core memory, one of the primary forms of computer memory in the 1960s. In comparison, FeRAM requires far less power to flip the state of the polarity, and does so much faster.
Comparison with other systems
FeRAM remains a relatively small part of the overall semiconductor market. In 2005, worldwide semiconductor sales were US $235 billion (according to the Gartner Group), with the flash memory market accounting for US $18.6 billion (according to IC Insights). The 2005 annual sales of Ramtron, perhaps the largest FeRAM vendor, were reported to be US $32.7 million. The much larger sales of flash memory compared to the alternative NVRAMs support a much larger research and development effort. Flash memory is produced using semiconductor linewidths of 30 nm at Samsung (2007) while FeRAMs are produced in linewidths of 350 nm at Fujitsu and 130 nm at Texas Instruments (2007). Flash memory cells can store multiple bits per cell (currently 3 in the highest density NAND flash devices), and the number of bits per flash cell is projected to increase to 4 or even to 8 as a result of innovations in flash cell design. As a consequence, the areal bit densities of flash memory are much higher than those of FeRAM, and thus the cost per bit of flash memory is orders of magnitude lower than that of FeRAM.
The density of FeRAM arrays might be increased by improvements in FeRAM foundry process technology and cell structures, such as the development of vertical capacitor structures (in the same way as DRAM) to reduce the area of the cell footprint. However, reducing the cell size may cause the data signal to become too weak to be detectable. In 2005, Ramtron reported significant sales of its FeRAM products in a variety of sectors including (but not limited to)electricity meters, automotive (e.g.black boxes, smart air bags), business machines (e.g. printers, RAID disk controllers), instrumentation, medical equipment, industrialmicrocontrollers, and radio frequency identification tags. The other emerging NVRAMs, such as MRAM, may seek to enter similar niche markets in competition with FeRAM.
Texas Instruments proved it to be possible to embed FeRAM cells using two additional masking steps during conventional CMOS semiconductor manufacture. Flash typically requires nine masks. This makes it possible for example, the integration of FeRAM on microcontrollers, where a simplified process would reduce costs. However, the materials used to make FeRAMs are not commonly used in CMOS integrated circuit manufacturing. Both the PZT ferroelectric layer and the noble metals used for electrodes raise CMOS process compatibility and contamination issues. Texas Instruments have incorporated an amount of FRAM memory into its MSP430 microcontrollers in its new FRAM series.
Magnetoresistive random-access memory
Non-volatile random-access memory technology under development since the 1990s. Continued increases in density of existing memory technologies – notably flash RAM andDRAM – kept it in a niche role in the market, but its proponents believe that the advantages are so overwhelming that magnetoresistive RAM will eventually become dominant for all types of memory, becoming a universal memory.
Unlike conventional RAM chip technologies, data in MRAM is not stored as electric charge or current flows, but by magnetic storage elements. The elements are formed from two ferromagnetic plates, each of which can hold a magnetic field, separated by a thin insulating layer. One of the two plates is a permanent magnet set to a particular polarity; the other plate’s field can be changed to match that of an external field to store memory. This configuration is known as a spin valve and is the simplest structure for an MRAM bit. A memory device is built from a grid of such “cells”.
The simplest method of reading is accomplished by measuring the electrical resistance of the cell. A particular cell is (typically) selected by powering an associated transistor that switches current from a supply line through the cell to ground. Due to the magnetic tunnel effect, the electrical resistance of the cell changes due to the orientation of the fields in the two plates. By measuring the resulting current, the resistance inside any particular cell can be determined, and from this the polarity of the writable plate. Typically if the two plates have the same polarity this is considered to mean “1”, while if the two plates are of opposite polarity the resistance will be higher and this means “0”.
Data is written to the cells using a variety of means. In the simplest, each cell lies between a pair of write lines arranged at right angles to each other, above and below the cell. When current is passed through them, an induced magnetic field is created at the junction, which the writable plate picks up. This pattern of operation is similar to core memory, a system commonly used in the 1960s. This approach requires a fairly substantial current to generate the field, however, which makes it less interesting for low-power uses, one of MRAM’s primary disadvantages. Additionally, as the device is scaled down in size, there comes a time when the induced field overlaps adjacent cells over a small area, leading to potential false writes. This problem, the half-select (or write disturb) problem, appears to set a fairly large size for this type of cell. One experimental solution to this problem was to use circular domains written and read using the giant magnetoresistive effect, but it appears this line of research is no longer active.
A newer technique, spin transfer torque (STT) or spin transfer switching, uses spin-aligned (“polarized”) electrons to directly torque the domains. Specifically, if the electrons flowing into a layer have to change their spin, this will develop a torque that will be transferred to the nearby layer. This lowers the amount of current needed to write the cells, making it about the same as the read process. There are concerns that the “classic” type of MRAM cell will have difficulty at high densities due to the amount of current needed during writes, a problem that STT avoids. For this reason, the STT proponents expect the technique to be used for devices of 65 nm and smaller. The downside is the need to maintain the spin coherence. Overall, the STT requires much less write current than conventional or toggle MRAM. Research in this field indicates that STT current can be reduced up to 50 times by using a new composite structure. However, higher speed operation still requires higher current.
Other potential arrangements include “Thermal Assisted Switching” (TAS-MRAM), which briefly heats up (reminiscent of phase-change memory) the magnetic tunnel junctions during the write process and keeps the MTJs stable at a colder temperature the rest of the time; and “vertical transport MRAM” (VMRAM), which uses current through a vertical column to change magnetic orientation, a geometric arrangement that reduces the write disturb problem and so can be used at higher density. 
A review paper provides the details of materials and challenges associated with MRAM in the perpendicular geometry. The authors describe a new term called “Pentalemma” – which represents a conflict in five different requirements such as write current, stability of the bits, readability, read/write speed and the process integration with CMOS. The selection of materials and the design of MRAM to fulfill those requirements are discussed.
Comparison with other systems
MRAM has similar performance to SRAM, similar density to DRAM but much lower power consumption than DRAM, and is much faster and suffers no degradation over time in comparison to flash memory. It is this combination of features that some suggest makes it the “universal memory”, able to replace SRAM, DRAM, EEPROM, and flash. This also explains the huge amount of research being carried out into developing it.
However, to date, MRAM has not been as widely adopted in the market as other non-volatile RAMs. It may be that vendors are not prepared to take the risk of allocating a modern fab to MRAM production when such fabs cost upwards of a few billion dollars to build and can instead generate revenue by serving developed markets producing flash and DRAM memories.
The very latest fabs seem to be used for flash, for example producing 16 Gbit parts produced by Samsung on a 50 nm process. Slightly older fabs are being used to produce most DDR2 DRAM, most of which is produced on a one-generation-old 90 nm process rather than using up scarce leading-edge capacity.
In comparison, MRAM is still largely “in development”, and being produced on older non-critical fabs. The only commercial product widely available at this point is Everspin‘s 4 Mbit part, produced on a several-generations-old 180 nm process. As demand for flash continues to outstrip supply, it appears that it will be some time before a company can afford to “give up” one of their latest fabs for MRAM production. Even then, MRAM designs currently do not come close to flash in terms of cell size, even using the same fab.
Alternatives to MRAM
Flash and EEPROM’s limited write-cycles are a serious problem for any real RAM-like role, however. In addition, the high power needed to write the cells is a problem in low-power roles, where non-volatile RAM is often used. The power also needs time to be “built up” in a device known as a charge pump, which makes writing dramatically slower than reading, often as much as 1,000 times. While MRAM was certainly designed to address some of these issues, a number of other new memory devices are in production or have been proposed to address these shortcomings.
To date, the only such system to enter widespread production is ferroelectric RAM, or F-RAM (sometimes referred to as FeRAM). F-RAM is a random-access memory similar in construction to DRAM but (instead of a dielectric layer like in DRAM) contains a thin ferroelectric film of lead zirconate titanate [Pb(Zr,Ti)O3], commonly referred to as PZT. The Zr/Ti atoms in the PZT change polarity in an electric field, thereby producing a binary switch. Unlike RAM devices, F-RAM retains its data memory when power is shut off or interrupted, due to the PZT crystal maintaining polarity. Due to this crystal structure and how it is influenced, F-RAM offers distinct properties from other nonvolatile memory options, including extremely high endurance (exceeding 1016 for 3.3 V devices), ultra low power consumption (since F-RAM does not require a charge pump like other non-volatile memories), single-cycle write speeds, and gamma radiation tolerance. Ramtron International has developed, produced, and licensedferroelectric RAM (F-RAM).
Another solid-state technology to see more than purely experimental development is Phase-change RAM, or PRAM. PRAM is based on the same storage mechanism as writable CDsand DVDs, but reads them based on their changes in electrical resistance rather than changes in their optical properties. Considered a “dark horse” for some time, in 2006 Samsungannounced the availability of a 512 Mb part, considerably higher capacity than either MRAM or FeRAM. The areal density of these parts appears to be even higher than modern flash devices, the lower overall storage being due to the lack of multi-bit encoding. This announcement was followed by one fromIntel and STMicroelectronics, who demonstrated their own PRAM devices at the 2006 Intel Developer Forum in October. One of the most attended sessions in the IEDM December 2006 was the presentation by IBM of their PRAM technology.
Also seeing renewed interest is silicon-oxide-nitride-oxide-silicon (SONOS) memory.
Silicon on insulator
Silicon on insulator (SOI) technology refers to the use of a layered silicon-insulator-siliconsubstrate in place of conventional silicon substrates in semiconductor manufacturing, especially microelectronics, to reduce parasitic device capacitance, thereby improving performance. SOI-based devices differ from conventional silicon-built devices in that the silicon junction is above an electrical insulator, typically silicon dioxide or sapphire (these types of devices are called silicon on sapphire, or SOS). The choice of insulator depends largely on intended application, with sapphire being used for high-performance radio frequency (RF) and radiation-sensitive applications, and silicon dioxide for diminished short channel effects in microelectronics devices. The insulating layer and topmost silicon layer also vary widely with application. The first industrial implementation of SOI was announced by IBM in August 1998.
The implementation of SOI technology is one of several manufacturing strategies employed to allow the continued miniaturization of microelectronic devices, colloquially referred to as extending Moore’s Law. Reported benefits of SOI technology relative to conventional silicon (bulk CMOS) processing include:
- Lower parasitic capacitance due to isolation from the bulk silicon, which improves power consumption at matched performance.
- Resistance to latchup due to complete isolation of the n- and p-well structures.
- Higher performance at equivalent VDD. Can work at low VDD’s. 
- Reduced temperature dependency due to no doping.
- Better yield due to high density, better wafer utilization.
- Reduced antenna issues
- No body or well taps are needed.
- Lower leakage currents due to isolation thus higher power efficiency.
- Inherently radiation hardened ( resistant to soft errors ), thus reducing the need for redundancy.
An SOI MOSFET is a semiconductor device (MOSFET) in which a semiconductor layer such as silicon or germanium is formed on an insulator layer which may be a buried oxide (BOX) layer formed in a semiconductor substrate. SOI MOSFET devices are adapted for use by the computer industry. The buried oxide layer can be used in SRAM memory designs. There are two type of SOI devices: PDSOI (partially depleted SOI) and FDSOI (fully depleted SOI) MOSFETs. For a n-type PDSOI MOSFET the sandwiched p-type
Unlike all of the other variants described in this section of this article, 1T DRAM is a different way of constructing the basic DRAM bit cell.
1T DRAM is a “capacitorless” bit cell design that stores data in the parasitic body capacitor that is an inherent part of silicon on insulator (SOI) transistors.
Considered a nuisance in logic design, this floating body effect can be used for data storage.
- refresh is still required
- reads are non-destructive
The classic one-transistor/one-capacitor (1T/1C) DRAM cell is also sometimes referred to as “1T DRAM”, particularly in comparison to 3T and 4T DRAM which it replaced in the 1970s.
Zero-capacitor (registered trademark, Z-RAM) is a novel dynamic random-access memory technology developed by Innovative Silicon based on the floating body effect of silicon on insulator (SOI) process technology.
Z-RAM has been licensed by Advanced Micro Devices for possible use in futuremicroprocessors. Innovative Silicon claims the technology offers memory access speeds similar to the standard six-transistor static random-access memory cell used in cache memory but uses only a single transistor, therefore affording much higher packing densities.
Z-RAM relies on the floating body effect, an artifact of the SOI process technology which places transistors in isolated tubs (the transistor body voltages “float” with respect to the wafer substrate underneath the tubs).
The same effect, however, allows a DRAM-like cell to be built without adding a separate capacitor, the floating body effect taking the place of the conventional capacitor.
Because the capacitor is located under the transistor (instead of adjacent to, or above the transistor as in conventional DRAMs), another connotation of the name “Z-RAM” is that it extends in the negative z-direction.
The reduced cell size leads, in a roundabout way, to Z-RAM being faster than even SRAM if used in large enough blocks.
While individual SRAM cells are sensed faster than Z-RAM cells, the significantly smaller cell reduces the size of Z-RAM memory blocks and thus reduces the physical distance that data must transit to exit the memory block.
As these metal traces have a fixed delay per unit length independent of memory technology, the shorter lengths of the Z-RAM signal traces can offset the faster SRAM cell access times.
For a large cache memory (as typically found in a high performance microprocessor), Z-RAM offers equivalent speed as SRAM but requiring much less space (and thus cost). Response times as low as three ns have been claimed.
In March 2010, Innovative Silicon announced it was jointly developing a non-SOI version of Z-RAM that could be manufactured on lower cost bulk CMOS technology.
A-RAM, Advanced-Random Access Memory is a DRAM memory based on single-transistor capacitor-less cells. A-RAM was invented in 2009 at the University of Granada, UGR (Spain) in collaboration with the Centre National de la Recherche Scientifique, CNRS (France). It was conceived by Noel Rodriguez (UGR), Francisco Gamiz (UGR) and Sorin Cristoloveanu (CNRS). A-RAM is compatible with single-gate silicon on insulator (SOI), double-gate, FinFETs and multiple-gate FETs (MuFETs).[clarification needed]
The conventional 1-Transistor + 1-Capacitor DRAM is extensively used in the semiconductor industry for manufacturing high density dynamic memories. Beyond the 45 nm node, the DRAM industry will need new concepts avoiding the miniaturization issue of the memory-cell capacitor. The 1T-DRAM family of memories, where the A-RAM is included, replaces the storage capacitor for the floating body of SOI transistors to store the charge.
A special type of computer memory used in certain very high speed searching applications. It is also known as associative memory, associative storage, or associative array, although the last term is more often used for a programming data structure.
It compares input search data (tag) against a table of stored data, and returns the address of matching data (or in the case of associative memory, the matching data). Several custom computers, like the Goodyear STARAN, were built to implement CAM, and were designatedassociative computers.
Hardware associative array
Unlike standard computer memory (random access memory or RAM) in which the user supplies a memory address and the RAM returns the data word stored at that address, a CAM is designed such that the user supplies a data word and the CAM searches its entire memory to see if that data word is stored anywhere in it. If the data word is found, the CAM returns a list of one or more storage addresses where the word was found (and in some architectures, it also returns the data word, or other associated pieces of data). Thus, a CAM is the hardware embodiment of what in software terms would be called an associative array. The data word recognition unit was proposed by Dudley Allen Buck in 1955.
Because a CAM is designed to search its entire memory in a single operation, it is much faster than RAM in virtually all search applications.
There are cost disadvantages to CAM however.
Unlike a RAM chip, which has simple storage cells, each individual memory bit in a fully parallel CAM must have its own associated comparison circuit to detect a match between the stored bit and the input bit.
Additionally, match outputs from each cell in the data word must be combined to yield a complete data word match signal. The additional circuitry increases the physical size of the CAM chip which increases manufacturing cost. The extra circuitry also increases power dissipation since every comparison circuit is active on every clock cycle. Consequently, CAM is only used in specialized applications where searching speed cannot be accomplished using a less costly method. One successful early implementation was a General Purpose Associative Processor IC and System.
To achieve a different balance between speed, memory size and cost, some implementations emulate the function of CAM by using standard tree search or hashing designs in hardware, using hardware tricks like replication or pipelining to speed up effective performance. These designs are often used in routers.
Binary CAM is the simplest type of CAM which uses data search words consisting entirely of 1s and 0s. Ternary CAM (TCAM) allows a third matching state of “X” or “Don’t Care” for one or more bits in the stored dataword, thus adding flexibility to the search. For example, a ternary CAM might have a stored word of “10XX0” which will match any of the four search words “10000”, “10010”, “10100”, or “10110”. The added search flexibility comes at an additional cost over binary CAM as the internal memory cell must now encode three possible states instead of the two of binary CAM. This additional state is typically implemented by adding a mask bit (“care” or “don’t care” bit) to every memory cell.
Holographic associative memory provides a mathematical model for “Don’t Care” integrated associative recollection using complex valued representation.
Ternary CAMs in Networking
Content-addressable memory is often used in computer networking devices.
It then looks up the destination MAC address in the table to determine what port the frame needs to be forwarded to, and sends it out on that port.
The MAC address table is usually implemented with a binary CAM so the destination port can be found very quickly, reducing the switch’s latency.
Ternary CAMs are often used in network routers, where each address has two parts:
- the network address, which can vary in size depending on the subnet configuration
- the host address, which occupies the remaining bits
Each subnet has a network mask that specifies which bits of the address are the network address and which bits are the host address.
Routing is done by consulting a routing table maintained by the router which contains each known destination network address, the associated network mask, and the information needed to route packets to that destination.
Without CAM, the router compares the destination address of the packet to be routed with each entry in the routing table, performing a logical AND with the network mask and comparing it with the network address. If they are equal, the corresponding routing information is used to forward the packet.
The addresses are stored using “don’t care” for the host part of the address, so looking up the destination address in the CAM immediately retrieves the correct routing entry;
both the masking and comparison are done by the CAM hardware.
Other CAM applications include:
Parallel random-access machine
A shared-memory abstract machine. As its name indicates, the PRAM was intended as the parallel-computing analogy to the random-access machine (RAM). In the same way that the RAM is used by sequential-algorithm designers to model algorithmic performance (such as time complexity), the PRAM is used by parallel-algorithm designers to model parallel algorithmic performance (such as time complexity, where the number of processors assumed is typically also stated).
Similar to the way in which the RAM model neglects practical issues, such as access time to cache memory versus main memory, the
Algorithm cost, for instance, is estimated using two parameters O(time) and O(time × processor_number).
Read/write conflicts in accessing the same shared memory location simultaneously are resolved by one of the following strategies:
- Exclusive read exclusive write (EREW)—every memory cell can be read or written to by only one processor at a time
- Concurrent read exclusive write (CREW)—multiple processors can read a memory cell but only one can write at a time
- Exclusive read concurrent write (ERCW)—never considered
- Concurrent read concurrent write (CRCW)—multiple processors can read and write. A CRCW PRAM is sometimes called a concurrent random-access machine.
Here, E and C stand for ‘exclusive’ and ‘concurrent’ respectively. The read causes no discrepancies while the concurrent write is further defined as:
- Common—all processors write the same value; otherwise is illegal
- Arbitrary—only one arbitrary attempt is successful, others retire
- Priority—processor rank indicates who gets to write
- Another kind of array reduction operation like SUM, Logical AND or MAX.
Several simplifying assumptions are made while considering the development of algorithms for PRAM. They are:
- There is no limit on the number of processors in the machine.
- Any memory location is uniformly accessible from any processor.
- There is no limit on the amount of shared memory in the system.
- Resource contention is absent.
- The programs written on these machines are, in general, of type SIMD.
These kinds of algorithms are useful for understanding the exploitation of concurrency, dividing the original problem into similar sub-problems and solving them in parallel.
PRAM algorithms cannot be parallelized with the combination of CPU and dynamic random-access memory (DRAM) because DRAM does not allow concurrent access; but they can be implemented in hardware or read/write to the internal static random-access memory (SRAM) blocks of a field-programmable gate array (FPGA), it can be done using a CRCW algorithm.
However, the test for practical relevance of PRAM (or RAM) algorithms depends on whether their cost model provides an effective abstraction of some computer; the structure of that computer can be quite different than the abstract model. The knowledge of the layers of software and hardware that need to be inserted is beyond the scope of this article. But, articles such as Vishkin (2011) demonstrate how a PRAM-like abstraction can be supported by theexplicit multi-threading (XMT) paradigm and articles such as Caragea & Vishkin (2011) demonstrate that a PRAM algorithm for the maximum flow problem can provide strong speedups relative to the fastest serial program for the same problem.
This is an example of SystemVerilog code which finds the maximum value in the array in only 2 clock cycles. It compares all the combinations of the elements in the array at the first clock, and merges the result at the second clock. It uses CRCW memory;
m[i] <= 1 and
maxNo <= data[i]are written concurrently. The concurrency causes no conflicts because the algorithm guarantees that the same value is written to the same memory. This code can be run on FPGA hardware.
module FindMax#(parameter int len = 8)
(input bit clock,resetN,input bit[7:0]data[len],output bit[7:0]maxNo);
always_ff@(posedge clock,negedge resetN)begin
When a program needs memory, it requests it from the operating system. The operating system then decides what physical location to place the memory in. Physical RAM much faster than hard disks. Where the computer spends more time moving memory from RAM to disk and back than it does accomplishing tasks: thrashing.
Virtual memory systems usually include protected memory, but this is not always the case.
Translation lookaside buffer
- Virtually addressed: requests are sent directly from the CPU to the cache, and the TLB is accessed only on a cache miss.
- Physically addressed: the CPU does a TLB lookup on every memory operation and the resulting physical address is sent to the cache.
- The TLB is sometimes implemented as content-addressable memory (CAM).
- The CAM search key is the virtual address and the search result is a physical address.
- If the requested address is present in the TLB, the CAM search yields a match quickly and the retrieved physical address can be used to access memory. This is called a TLB hit.
- If the requested address is not in the TLB, it is a miss, and the translation proceeds by looking up the page table in a process called a page walk.
- The page walk requires a lot of time when compared to the processor speed, as it involves reading the contents of multiple memory locations and using them to compute the physical address. After the physical address is determined by the page walk, the virtual address to physical address mapping is entered into the TLB.
A translation lookaside buffer (TLB) has a fixed number of slots containing the following entries:
- page table entries, which map virtual addresses to
- physical addresses
- intermediate table addresses
- segment table entries, which map virtual addresses to
- segment addresses
- intermediate table addresses
- page table addresses.
The virtual memory is the space seen from a process. This space is often segmented in pages of a fixed size. The page table (generally stored in memory) keeps track of where the virtual pages are stored in the physical memory. The TLB is a cache of the page table; that is, only a subset of page table contents is held in TLB.
In a Harvard architecture or hybrid thereof, a separate virtual address space or memory access hardware may exist for instructions and data. This can lead to distinct TLBs for each access type, an
- Instruction Translation Lookaside Buffer (ITLB)
- Data Translation Lookaside Buffer (DTLB)
- Instruction cache to speed up executable instruction fetch
- Data cache to speed up data fetch and store
- Translation lookaside buffer (TLB) used to speed up virtual-to-physical address translation for both executable instructions and data. The data cache is usually organized as a hierarchy of more cache levels (L1, L2, etc.; see Multi-level caches)
The replacement policy decides where in the cache a copy of a particular entry of main memory will go.
- If the replacement policy is free to choose any entry in the cache to hold the copy, the cache is called fully associative.
- At the other extreme, if each entry in main memory can go in just one place in the cache, the cache is direct mapped.
- Many caches implement a compromise in which each entry in main memory can go to any one of N places in the cache, and are described as N-way set associative.
For example, the level-1 data cache in an AMD Athlon is two-way set associative, which means that any particular location in main memory can be cached in either of two locations in the level-1 data cache.
Example: the K8
The K8 has 4 specialized caches: an instruction cache, an instruction TLB, a data TLB, and a data cache. Each of these caches is specialized:
- The instruction cache keeps copies of 64-byte lines of memory, and fetches 16 bytes each cycle. Each byte in this cache is stored in ten bits rather than 8, with the extra bits marking the boundaries of instructions (this is an example of predecoding). The cache has only parity protection rather than ECC, because parity is smaller and any damaged data can be replaced by fresh data fetched from memory (which always has an up-to-date copy of instructions).
- The instruction TLB keeps copies of page table entries (PTEs). Each cycle’s instruction fetch has its virtual address translated through this TLB into a physical address. Each entry is either 4 or 8 bytes in memory. Because the K8 has a variable page size, each of the TLBs is split into two sections, one to keep PTEs that map 4 KB pages, and one to keep PTEs that map 4 MB or 2 MB pages. The split allows the fully associative match circuitry in each section to be simpler. The operating system maps different sections of the virtual address space with different size PTEs.
- The data TLB has two copies which keep identical entries. The two copies allow two data accesses per cycle to translate virtual addresses to physical addresses. Like the instruction TLB, this TLB is split into two kinds of entries.
- The data cache keeps copies of 64-byte lines of memory. It is split into 8 banks (each storing 8 KB of data), and can fetch two 8-byte data each cycle so long as those data are in different banks. There are two copies of the tags, because each 64-byte line is spread among all 8 banks. Each tag copy handles one of the two accesses per cycle.
The K8 also has multiple-level caches. There are second-level instruction and data TLBs, which store only PTEs mapping 4 KB. Both instruction and data caches, and the various TLBs, can fill from the large unified L2 cache. This cache is exclusive to both the L1 instruction and data caches, which means that any 8-byte line can only be in one of the L1 instruction cache, the L1 data cache, or the L2 cache. It is, however, possible for a line in the data cache to have a PTE which is also in one of the TLBs—the operating system is responsible for keeping the TLBs coherent by flushing portions of them when the page tables in memory are updated.
The K8 also caches information that is never stored in memory—prediction information. These caches are not shown in the above diagram. As is usual for this class of CPU, the K8 has fairly complex branch prediction, with tables that help predict whether branches are taken and other tables which predict the targets of branches and jumps. Some of this information is associated with instructions, in both the level 1 instruction cache and the unified secondary cache.
The K8 uses an interesting trick to store prediction information with instructions in the secondary cache. Lines in the secondary cache are protected from accidental data corruption (e.g. by an alpha particle strike) by either ECC or parity, depending on whether those lines were evicted from the data or instruction primary caches. Since the parity code takes fewer bits than the ECC code, lines from the instruction cache have a few spare bits. These bits are used to cache branch prediction information associated with those instructions. The net result is that the branch predictor has a larger effective history table, and so has better accuracy.
Cache reads are the most common CPU operation that takes more than a single cycle. Program execution time tends to be very sensitive to the latency of a level-1 data cache hit.
The simplest cache is a virtually indexed direct-mapped cache.
- The virtual address is calculated with an adder,
- The relevant portion of the address extracted and used to index an SRAM, which returns the loaded data.
- The data is byte aligned in a byte shifter, and from there is bypassed to the next operation.
There is no need for any tag checking in the inner loop — in fact, the tags need not even be read. Later in the pipeline, but before the load instruction is retired, the tag for the loaded data must be read, and checked against the virtual address to make sure there was a cache hit. On a miss, the cache is updated with the requested cache line and the pipeline is restarted.
An associative cache is more complicated, because some form of tag must be read to determine which entry of the cache to select. An N-way set-associative level-1 cache usually reads all N possible tags and N data in parallel, and then chooses the data associated with the matching tag. Level-2 caches sometimes save power by reading the tags first, so that only one data element is read from the data SRAM.
The diagram to the right is intended to clarify the manner in which the various fields of the address are used. Address bit 31 is most significant, bit 0 is least significant.
The diagram shows the SRAMs, indexing, and multiplexing for a 4 KB, 2-way set-associative, virtually indexed and virtually tagged cache with 64 byte (B) lines, a 32-bit read width and 32-bit virtual address.
Because the cache is 4 KB and has 64 B lines, there are just 64 lines in the cache, and we read two at a time from a Tag SRAM which has 32 rows, each with a pair of 21 bit tags.
Although any function of virtual address bits 31 through 6 could be used to index the tag and data SRAMs, it is simplest to use the least significant bits.
Similarly, because the cache is 4 KB and has a 4 B read path, and reads two ways for each access, the Data SRAM is 512 rows by 8 bytes wide.
A more modern cache might be 16 KB, 4-way set-associative, virtually indexed, virtually hinted, and physically tagged, with 32 B lines, 32-bit read width and 36-bit physical addresses.
The read path recurrence for such a cache looks very similar to the path above.
- vhints are read, and matched against a subset of the virtual address.
- virtual address is translated into a physical address by the TLB, and the physical tag is read (just one, as the vhint supplies which way of the cache to read).
- Finally the physical address is compared to the physical tag to determine if a hit has occurred.
Some SPARC designs have improved the speed of their L1 caches by a few gate delays by collapsing the virtual address adder into the SRAM decoders. See Sum addressed decoder.
Protected memory assigns programs their own areas of memory. If the operating system detects that a program has tried to alter memory that does not belong to it, the program is terminated. This way, only the offending program crashes, and other programs are not affected by the error.
Protected memory systems almost always include virtual memory as well.
Segmentation refers to dividing a computer’s memory into segments. A reference to a memory location includes a value that identifies a segment and an offset within that segment.
can be used to reference segments in the computer’s memory. Pointers to memory segments on x86 processors can also be stored in the processor’s segment registers. Initially x86 processors had 4 segment registers,
- CS (code segment)
- SS (stack segment)
- DS (data segment)
- ES (extra segment)
later another two segment registers were added – FS and GS.
- Arithmetic Logic Unit
- Processor Registers
- Control Unit
- Instruction Register
- Program Counter
- Memory (data and instructions, external mass storage, etc.)
- Used for arithmetic
- Stored back in main memory, either by the same instruction or a subsequent one bymachine instruction
- Programs are stored in special file types, different from those used for data.
- Executable files contain programs; all other files are data files.
- However, executable files may also contain data which is “built-in” to the program.
- Some executable files have a data segment, which nominally contains constants and initial values (both data).
The line between program and data can become blurry.
An interpreter, for example, is a program. The input data to an interpreter is itself a program—just not one expressed in native machine language. In many cases, the interpreted program will be a human-readable text file, which is manipulated with a text editor—more normally associated with plain text data.
Metaprogramming similarly involves programs manipulating other programs as data. Also, for programs like compilers, linkers, debuggers, program updaters, etc. may other programs serve as data.
Machine-readable data is data (or metadata) which is in a format that can be understood by a computer.
There are two types:
- Human-readable data that is marked up so that it can also be read by machines (examples; microformats, RDFa)
- Data file formats intended principally for machines (RDF, XML, JSON)
For purposes of implementation of the GPRA Modernization Act (GPRAMA), the Office of Management and Budget (OMB) defines “machine readable” as follows: “Format in a standard computer language (not English text) that can be read automatically by a web browser or computer system. (e.g.; xml). Traditional word processing documents, hypertext markup language (HTML) and portable document format (PDF) files are easily read by humans but typically are difficult for machines to interpret. Other formats such as extensible markup language (XML), (JSON), or spreadsheets with header columns that can be exported as comma separated values (CSV) are machine readable formats. It is possible to make traditional word processing documents and other formats machine readable but the documents must include enhanced structural elements.”
Publishing public data in an open, standard, machine-readable format is a best practice (good operating practice).
Markup is embedded in text and provides instructions for programs that are to process the text
- The kind of markup used by traditional word-processing systems: binary codes embedded within document text that produce the WYSIWYG effect.
- Usually designed to be hidden from human users, even those who are authors or editors.
- Text with such markup is often edited with the markup visible and directly manipulated by the author
- Includes programming constructs, so macros or subroutines can be defined and invoked by name
- Markup is used to label parts of the document rather than to provide specific instructions as to how they should be processed
- Decouples the inherent structure of the document from any particular treatment or rendition of it
- Such markup is often described as “semantic”
- Example HTML’s <cite> tag, which is used to label a citation
- Descriptive markup — sometimes called logical markup or conceptual markup
- Encourages authors to write in a way that describes the material conceptually, rather than visually
- The root element of an XHTML document must be html
- must contain an xmlns attribute to associate it with the XHTML namespace
- The namespace URI for XHTML is http://www.w3.org/1999/xhtml
- xml:lang attribute to identify the document with a natural language
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
The example below shows an XHTML document with a minimum of required tags (http://www.w3schools.com/html/html_xhtml.asp):
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
<title>Title of document</title>