© Copyright Brian Brown, 1992-2000. All rights reserved.
CPU AND MEMORY
The objective of this section is to
At the end of this section, you should be able to
The functional diagram of a typical computer system is shown below,
Fig 4_1: Computer System Block Diagram
The address bus is used by the processor to select a specific memory location within the memory subsystem, or a specific peripheral chip.
The data bus is used to transfer data between the processor and memory subsystem or peripheral devices.
The control bus provides timing signals to synchronise the flow of data between the processor and memory subsystem or peripheral devices.
The central processor (CPU) is the chip which acts as a control centre for all operations. It executes instructions (a program) which are contained in the memory section.
Basic operations involve
The CPU is said to be the brains of any computer system. It provides all the timing and control signals necessary to transfer data from one point to another in the system.
Instructions and Operand's
A program consists of a number of CPU instructions. Each instruction consists of
The instruction code specifies to the CPU what to do, where the data is located, and where the output data (if any) will be put.
Instructions are held in the memory section of the computer system. Instructions are transferred one at a time into the CPU, where they are decoded then executed. Instructions follow each other in successive memory locations.
Fig 4_2: Program Instructions
Memory locations are numbered sequentially. The processor unit keeps track of the instruction it is executing by using a internal counter. This counter holds the location in memory of the instruction it is executing. Its name is the program counter (sometimes called instruction pointer).
Most computer systems today are stored program control systems. This means that the processor executes instructions which are stored in a memory subsystem. SPC systems are popular, because the processor does is simply changed by altering the instruction in the memory system. This makes for a general purpose computer system, capable of performing a wide variety of different tasks dependant upon the stored program contents.
Memory contains data or instructions for the processor to execute. All memory has common features.
Each memory location is referred to as an address, and generally expressed in hexadecimal notation (using base 16 numbers).
The processor selects a specific address in memory by placing the address on a special multi-bit bus called the address bus . The value on this address bus is used by the memory system to find the specific location within the chip which the processor is requiring access to.
The total number of address locations which can be accessed by the processor is known as its physical address space. How large this is determined by the size of the address bus, and is often expressed in terms of Kilobytes (x1024) or Megabytes.
of Computer Memory
System memory consists of two main types.
ROM is non-volatile. This means the contents do not disappear when the power to the system is turned off.
EPROM is a special type of ROM which can be programmed by the user. Its contents can also be erased by exposing it to ultra-violet light.
EEPROM is another special type of ROM which can be programmed by the user. It contents are erased by applying a specific voltage to one of its input pins whilst providing the appropriate timing signals.
No need to refresh
Consumes More Power
Cache memory is high speed memory which interfaces between the processor and the system memory. Dynamic memory is used to implement large memory systems in modern computers. This is due to features like low power consumption, high chip densities and low cost.
Fig 4_3: Cache Memory
Dynamic memory is however slow, and cannot keep up with modern fast processors. When a processor requests data from a memory chip, it expects to receive that data within a specific time. This is expressed as a number of clock cycles.
It is common for processors to run what is called a FOUR STAGE BUS CYCLE (which is four processor clocks long). Essentially, during the first processor clock cycle, the address is placed on the address bus. the second processor clock cycle is used to latch the address internally within the memory chip. The third processor clock cycle is used by the chip to find the data and place it on the data bus. The fourth processor clock cycle is used by the processor to latch the data on the data bus into its own internal hold register.
Dynamic memory is currently too slow to keep up with processors running at clock rates of 50MHz or greater (each cycle is 20ns). To use dynamic memory with fast processors requires extending the third processor clock cycle by another (or multiples thereof) processor clock cycle. The name for this extra processor clock cycle is called a wait state. What this does is change a four stage bus cycle into a five stage bus cycle (or greater), meaning that the fast processor is actually running just as fast as a slower processor (its being slowed down by the memory subsystem, whenever it accesses memory).
It is too expensive to use static memory in place of dynamic memory. To use slow dynamic memory with a fast processor requires an extra hardware subsystem (called cache memory) which fits between the processor and the memory subsystem.
All memory accesses by the processor are fed through the cache system. It comprises an address comparator which monitors the address requests by the processor, high speed static ram, and extra hardware chips.
The cache system starts off by trying to read as much data as possible from the dynamic memory subsystem. It stores this data in its own high speed static memory (or cache). When a processor request arrives, it checks to see if the address request is the same as that which it has already read from the memory sub-system. If it is, it supplies the data directly from its static cache. If the address is not cached, then it lets the processor access the main memory system directly (but the processor does this slower). The cache system then updates its own address counter it uses to read from system memory to that of the processors, and tries to read as much data as possible before the next processor request arrives.
When the cache system can respond to the processor request, its called a cache hit. If the cache system cannot service the processor request, its called a cache miss.
The IO bus is the interconnection path between the processor and input/output devices (including memory). The bus is divided into THREE main sections
In more complex systems, the memory subsystem or peripheral devices also provide timing signals to complete data transfers, or initiate requests that the processor responds to (called interrupts).
Peripheral devices allow input and output to occur. Examples of peripheral devices are
The processor is involved in the initialisation and servicing of these peripheral devices.
An input output processor is a special processor dedicated to handling peripheral devices like terminals, tape and disk units, and printers.
Mainframe systems like the IBM 370 use I/O processors to off load work from the system processor. This lets the system processor get more work done executing user programs without having to worry about handling data input and output to terminals or printing documents.
The PC has an I/O processor in the keyboard, which handles the complex operations of scanning the keys.
In addition, it is now becoming common to have I/O processors on graphics cards. The S3 graphics card is a good example of this, which supports hardware support for scrolling, sizing and moving windows. This removes these tasks from the system processor, and performs them at a much higher rate (up to 30 times faster).
CHANNEL COPROCESSOR (IBM)
To allow concurrent operation of the CPU and I/O devices requires the use of a special I/O processor. The main CPU instructs the I/O processor to perform the required data transfer. When the transfer is completed, the I/O processor informs the main processor of the status of the operation.
This method frees the main processor to perform other tasks whilst I/O is being done (tasks requesting I/O are blocked by the OS and thus not scheduled for processor time).
Typical features of an I/O channel processor system are
There are two main types of IO channels
Both channels support a number of devices on a bus called a sub-channel.
The selector channel operates in burst mode only. It handles a single sub-channel at a time, and has very high transfer rates. Typically, it controls high speed disk units.
The multiplexor channel handles more than one sub-channel at a time by interleaving requests. It operates in byte and word mode, but does support burst at a much lower rate than a selector channel. Typically, it handles devices like printers and character terminals.
The processor initiates an I/O transfer by setting up a special IOC program in main memory. It then issues a STARTIO instruction, which identifies the channel and sub-channel.
The channel then accesses and runs the channel program (the address of which is in location 72). When finished, the channel updates the IO flag in the processors status register to signal command completion. The processor then checks the channel status register for results.
Each channel gets informed of
Fig 4_4: IBM Channel Operation
Central Processor Revisited
We shall now take a closer look at how the processor functions internally.
The Fetch, Decode, Execute Cycle
Most modern processors work on the fetch, decode, execute principle. This is also called the Von Nuemen Architecture. The execution of an instruction by a processor is split into THREE distinct phases, Fetch, Decode, and Execute.
Fig 4_5: Fetch Cycle, reading the instruction
In the above image, the processor is ready to begin the Fetch cycle. The current contents of the instruction counter is address 0100. This value is placed on the address bus, and a READ signal is activated on the control bus. The memory receives this and finds the contents of the memory location 0100, which happens to be the instruction MOV AX, 0.
The memory places the instruction on the Data Bus, and the processor then copies the instruction from the Data Bus to the Instruction Register.
Fig 4_6: Decode cycle, decoding the instruction
In the above image, the processor transfers the instruction from the instruction register to the Decode Unit. It compares the instruction to an internal table, and when a match is found, the table contains the list of macro instructions (a number of steps) which are required to perform the instruction. In our case, the instruction means place the value 0 into the AX register. The decode unit now has all the details of how to do this.
Fig 4_7: Execute cycle, executing the instruction
In the above image, the processor executes the series of macro instructions related to the instruction MOV AX,0. The final part is to adjust the Instruction Counter to point to the next instruction to be executed, which is found at address 0102.
Graphical Animation of
The following graphic animation illustrates typical operation of an instruction by the processor. It places the contents of the instruction pointer onto the address bus and fetches the instruction. Once decoded, the instruction is executed and the instruction pointer altered to point to the next instruction.
Fig 4_8: Animation of Instruction Fetch
We shall now look at the internal operation of the CPU, and how it performs the fetch, decode, execute cycle. Internally, the CPU is made up of a number of discrete sections.
The ALU normally works on two numbers at a time. Often, one of the numbers is found in an internal location of the processor, whilst the other is a constant or found in the memory system. The reason for most arithmetic and logic operations using operand's which are located inside the processor is speed. This is due to not having to perform a fetch cycle for transferring the operand from the memory system to an internal hold point (called latch) in order to execute the instruction.
The purpose of the ALU is to perform arithmetic and logic operations .
During the fetch cycle, the processor places the contents of this counter on the address bus. A read signal is issued on the control bus, then timing signals are generated to transfer (copy) the instruction from the memory location in system memory to an internal hold latch inside the processor (called the instruction register).
During the decode cycle, the instruction counter is adjusted to point to the next instruction to be executed from system memory (calculated from the current instruction).
The purpose of the instruction pointer is to hold the address of the instruction the processor is about to execute from system memory.
The decoded instruction might look like
The purpose of the instruction register is to hold a copy of the instruction which the processor is about to execute.
The reason why internal register banks (a group of registers) are used is speed. Data inside the processor is manipulated significantly faster than data external to the processor (ie, located in system memory). This is because of the time required to fetch the data from system memory and transfer it into an internal hold latch before it can be manipulated.
For instance, to multiply the contents of a memory location by 2, the processor needs to first read the memory location value to an internal register, transfer it to the accumulator, multiply it by 2, then write the ALU contents back to the memory location. The two memory cycles consume time.
In contrast, to multiply an internal register by 2 requires no external system memory access, and the lack of this overhead means that instructions of this type execute faster than those which make external references to system memory.
The purpose of internal processor register banks is to provide temporary storage for variables and calculations.
Programming Model of a CPU
The programming model of a processor defines the registers within the processor which are visible and programmable by the user.
Executing a program: An example
Lets consider the operation of the following program at the processor level.
Assembler High Level Language MOV AX, #1 A := 1; MOV BX, #2 B := 2; ADD AX, BX C := A + B; PUSH AX Writeln( C ); CALL WRITELN
Assume that the instruction pointer contains the address of the first instruction.
The computer base unit houses the CPU, memory, floppy disk, hard disk drive, power supply unit, and peripheral cards which support printers and modems.
Fig 4_9: Computer Base Unit
The expansion slots are used to plug in additional peripheral cards like sound cards, TV Tuners cards, video capture cards etc. The two main types of expansion slots are PCI and ISA.
Most modern Processors work on three cycles, fetch, decode and execute.
Processors use internal temporary storage areas for holding data, these are referred to as registers.
The set of registers which programmers can alter is referred to as the programming model.
The Arithmetic Logic Unit handles mathematical operations like add, subtract, multiply, divide, shift and rotate.
Home | Other Courses | Notes | Tests | Videos
© Copyright Brian Brown, 1992-2000. All rights reserved.