# Bit-Level Parallelism - A Case Study on a 32-Bit Microprocessor

Mohammed Said bin Mohd Isa<sup>#1</sup>, Mohamed Faidz Mohamed Said<sup>#2</sup>

<sup>#</sup> Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA 70300 Seremban, Negeri Sembilan, MALAYSIA <sup>1</sup> mohammedsaidofficial@gmail.com

<sup>2</sup> faidzms@ieee.org

Abstract—The finest grain parallelism is a definitive method for getting the most extreme execution. Bit-level parallelism is the finest grain that can ever be accomplished. Bit-level parallelism is additionally simple to misuse and see hypothetically in light of the fact that it is only a truth table. Issues related with bit-level parallelism, for example, equipment confinement and confirmation methods are being overcome by advances in expansive scale coordination (VLSI) and compiler innovations. With the appearance of XILINK pin grid array (PGA), abusing bit-level parallelism is no longer a fantasy yet there are a few issues which must be considered. The advancement of chip design relies on the changing parts of innovation. As kick the bucket thickness and speed increment, memory and program conduct turn out to be progressively critical in characterizing design exchange offs. While innovation empowers progressively complex processor usage, there are physical, and program conduct cutoff points to the handiness of this multifaceted nature. Physical cutoff points incorporate gadget confines and reasonable breaking points on power and cost. Programs conduct limits result from capricious occasions happening amid execution. Designs and usage that traverse these points of confinement are fundamental to the advancement of the microchip.

## Keyword: parallelism, microprocessor, bit-level, 32-bit

#### I. INTRODUCTION

The primary microprocessors, presented in the mid 1970s, were by and large outlined as mass rationale substitutions. As innovation created and clients turned out to be more modern, the sort of utilization turned out to be comparatively all the more requesting, as far as both immaculate preparing force and usefulness. By the late 1970s, when the present era of 16bit processors such as MC68000, Intel 8086, and NS16000 tagged along, the framework fashioners' desire had developed in like manner [1].

With these machines, architects were practically at a phase where they could utilize a microprocessor in an application where beforehand, for instance, a bit-cut processor would have been required. As far as execution, the 16-bit machines could, best case scenario, achieve 1.0 million guidelines for each second, namely Million Instructions Per Second (MIPS). Nevertheless, to have an effect in the rising superior market, normal rates of no less than 2-3 MIPS are required. This is the reason why researchers are presently observing the initial 32bit chip, including the as of late declared Motorola MC68020 [2].

#### A. Definition

Bit-level parallelism is a type of parallel registering in view of expanding processor word estimate, contingent upon huge scale reconciliation very large-scale integration (VLSI) innovation. Upgrades in personal computers (PCs) outlines were finished by expanding bit-level parallelism [3].

The 32-bit microprocessor was the essential processor utilized as a part of all PCs until the mid-1990s. Intel Pentium processors and early AMD processors were 32-bit processors. The working framework and programming on a PC with a 32-bit processor is additionally 32-bit based, in that they work with information units that are 32 bits wide. Windows 95, 98, and XP are every one of the 32-bit working frameworks that were regular on PCs with 32-bit processors [4].

## II. BACKGROUND

Expanding the word measure diminishes the quantity of directions the processor must execute to play out an operation on factors whose sizes are more prominent than the length of the word. For instance, consider a situation where an 8-bit processor must include two 16-bit whole numbers. The processor should first include the 8 bring down request bits from every whole number, at that point include the 8 higher-arrange bits, requiring two guidelines to finish a solitary operation. However, a 16-bit processor would have the capacity to finish the operation with single direction [5].

Verifiably, 4-bit microprocessor were supplanted with 8-bit, at that point 16-bit, at that point 32-bit microprocessor, and later at that point 64-bit microprocessor. 32-bit processors have been a standard when all is said and done reason figuring for around 20 years, yet now 64-bit processors are leading the pack [6].

#### III. PAPER REVIEW

There are several papers that can be reviewed in this research.

# A. Design and Simulation of a 32-Bit RISC based MIPS Processor using Verilog

Chip and microcontrollers are for the most part orchestrated in the region of two manage PC models: Complex Guideline Set Computing, namely CISC design and Diminished Instruction Set Computing, namely RISC layout. The likelihood of CISC depends on upon Instruction Set Engineering (ISA) orchestrate that expands performing further with a few courses using alterable number of operands and an out spread variety of watching out for modes in novel ranges in its instruction set. Hence making them have differentiating execution time and lengths along these lines totally asking for a flexible control unit, which includes a massively existent zone on the chip. Separated and their CISC clear, RISC processors routinely strengthen a little arrangement of headings. To show an exhibit of the complexities of RISC processor and CISC processor, the amount of standards in a RISC processor is low. The amount of widely profitable registers and watching out for modes, settled lead length and load-store arrangement is more along these lines. This underpins the execution of principles to be done in a short cross accordingly while satisfying all higher executions [7].

Microchip without Interlocked Pipeline Stages (MIPS) is a RISC (Reduced Instruction Set Computing) planning. Pipelined MIPS has five phases which are IF, ID, EX, MEM what is more, WB. Pipelining derives two or three operations in single information course at a relative moment. Pipelining is utilized to upgrade the breaking points of the RISC processor which is the purpose for its utilization in this kind of PC design. A multicycle CPU fuses boundless errand. So, in the event that one errand happens, instead of sitting tight for the procedure to complete, another attempt is started in similar information route in the meantime without encroaching with the past undertaking. The strategies are in this manner withdrawn into various pipelined stages. Taking after each clock another operation is incited in the pipeline stage to which the technique is being encouraged to. The inciting is managed without making any intrusions the past method. This makes concurrent use of all arranges in the information way that is available. This is performed as the requirements would be to increment capacity to the throughput of MIPS [8].



Figure 1. 5-Stage Pipelined MIPS

# B. Design and Implementation of the 'Tiny RISC' Microprocessor

Tiny RISC has been intended for improved information way operation. A straightforward 32-bit augmentation of the information way brings about a bland practical unit of a pipelined VLIW (Very Long Instruction Word) processor called VIPER that is at present being created at UC Irvine. Little RISC uses a pipeline structure that is refined for expanded execution in a processor with numerous practical units, for example, VIPER. Basically, Tiny RISC is a 16-bit rendition of the practical units utilized, composed as a remain solitary chip that can be utilized as a full-scale cell for an assortment of applications that require control processors with superior, minimal effort and little pass on region. An illustration would be the control processor of multi-media chips [9]. A full-custom approach was taken in the outline of Tiny RISC. This approach brings about higher execution and littler pass on range contrasted with robotized techniques for combining information ways. Expansive control structures were actualized utilizing programmable logic array (PLAs) in the main form of Tiny RISC. The control hardware is as of now being altered to utilize standard cells only. The primary rendition of Tiny RISC accomplished a process duration of 70 ns (14.3 MHz clock speed). This gives it a pinnacle execution of 14 (I 6-bit) MIPS. The processor was laid out utilizing the MAGIC VLSI format framework 4. IRSIM was utilized broadly to recreate and confirm the operation of the processor [9].



### Figure 2. Pipeline structure of Tiny RISC

#### C. Basic Issues in Microprocessor Architecture

Effective microchip executions rely on the processor engineer's capacity to foresee patterns and advances in both innovations also, client conduct. Choosing an approach for a chip usage relies on upon the planner's capacity to effectively demonstrate the effect of new advances, new applications, new programming what is more, CAD instruments. The best chip executions depend not just on the utilization of the present cutting edge in equipment calculations, be that as it may, all the more vitally in uniting the learning of these calculations together with anticipated advances in the innovation and client best in class [10].

#### Table 1. 1994 SIA roadmap summary

1994 SIA roadmap summary

| Year of 1st DRAM Chip       | 1992 | 1995 | 1998 | 2001 | 2004 | 2007  |
|-----------------------------|------|------|------|------|------|-------|
| Feature size (µm)           | 0.50 | 0.35 | 0.25 | 0.18 | 0.13 | 0.10  |
| $V_{\rm DD}(V)$             | 5.0  | 3.3  | 2.5  | 1.8  | 1.5  | 1.2   |
| Trans/Chip                  | 5M   | 10M  | 20M  | 50M  | 110M | 260 M |
| Die Size (mm <sup>2</sup> ) | 210  | 250  | 300  | 360  | 430  | 520   |
| Freq (MHz)                  | 225  | 300  | 450  | 600  | 800  | 1000  |
| DRAM bits/chip              | 16M  | 64M  | 256M | 1G   | 4G   | 16G   |
| SRAM bits/chip              | 4M   | 16M  | 64M  | 256M | 1G   | 4G    |
| Maximum power/chip (W)      | 60   | 80   | 100  | 120  | 140  | 160   |

Testing of chip has been broadly investigated in the specialized writing 3-s. Testing strategies can be characterized by the level of detail utilized for displaying the microchip and creating test vectors 6-8. A low-level system, (for example, those portrayed in References 6 and 9) depends on a door level depiction of the microchip. This sort of approach is to a great degree exact in recognizing issues; be that as it may, it experiences the drawback of high-test unpredictability (i.e. an expansive number of test vectors) since test vectors are produced for each blame in its door level depiction. Along these lines, the multifaceted nature of test design era and the execution time of the testing procedure (normally alluded to as testing time) might be to a great degree high [11].

An alternate approach comprises of an abnormal state system which practically tests the microprocessorT'l°'11." Functional testing tests a microchip as an accumulation of consistent units, therefore disregarding a low-level depiction. Testing is finished by having the microchip execute diverse arrangements of guidelines which energize distinctive arrangements of deficiencies. As no entryway level portrayal is utilized (or made accessible), testing many-sided quality and time are fundamentally lessened. Be that as it may, the likelihood of recognizing issues (generally alluded to as blame scope) is restricted to shortcomings which make useful blunders, (for example, including rather than subtracting); consequently, a low-level blame, (for example, a stuck to blame) to a specific door not included in major useful operations, will probably not be distinguished. For the most part, useful methodologies do not have a sufficiently high blame scope to warrant their selective use for testing 1°. Cross breed approaches have been created by attempting to join the benefits of both practical and low-level methodologies 1,12 to such an extent that a microchip is isolated into utilitarian units with high (however not inordinate) depiction granularity. Units are portrayed by their capacities, as well as by their collaborations and associations with different units [12].

### D. 32-bit computer design using the 68020, 68881 and 68851

Outlining a circuit utilizing a specific chip surprisingly is not a clear movement. In spite of the fact that the maker's information sheet should offer the authoritative depiction of the chip, it habitually gives practically no sign of how a chip is utilized as a part of practice. Especially troublesome issues confront the fashioner who wishes to utilize one of today's mind boggling 32-bit chip in a circuit with dynamic memory, a coprocessor, and a refined memory administration unit. This application note exhibits the plan of an elite 68020-based microcomputer utilizing a 68881 drifting point coprocessor and a 68851 paged memory administration unit. Complete subtle elements of a dynamic memory framework are incorporated. The note can be utilized as a guide or even a layout for those considering planning their own particular 68020 frameworks [13].



Figure 3. Microprocessor (MPU), floating-point coprocessor (FPCP), paged memory management unit (PMMU) and associated circuity

## E. Characterization and synthesis of a 32-bit asynchronous microprocessor in synchronous reconfigurable devices

These days, best usages acquired in offbeat chip have been created at the application-specific integrated circuit (ASIC) level. Offbeat plan has been utilized from the earliest starting point of the PC age, even before the VLSI innovation was conceivable. Because of the presentation and advances of coordinated circuits, the worldview of synchronous plan wound up plainly mainstream and came to be the predominant outline style. Be that as it may, as of late, nonconcurrent configuration has had a rebound in ASIC usage. Programmable gadgets are a fantastic choice for creating less expensive and quicker advanced circuit models, because of their awesome combination capability adaptability. In that unique circumstance, and nonconcurrent configuration can be performed utilizing Field-Programmable Gate Array (FPGAs) gadgets. To make this stage down to earth and helpful to the offbeat plan, some Self-Timed (ST) control piece systems and enduring/lock delays are required. This enables us to construct the ST synchronization circuits. The majority of the microchips are made with a worldwide clock synchronization framework, in which the entire or part of the circuit is liable to an extraordinary heartbeat line, which circulates and synchronizes information exchange. What is more, synchronous microchips that utilize a solitary clock can realize different issues because of the popularity of handling. To defeat this issue, nonconcurrent frameworks are proposed, since in a ST synchronization framework, the control of information exchange between squares is controlled through neighbourhood marking lines that show the demand and information exchange between adjoining pieces. Since these sorts of frameworks do not rely on upon a worldwide clock, they take full favourable position of the speed and vitality utilization when executed in programmable gadgets. Offbeat frameworks are generally new, yet they introduce preferred execution over their homologous synchronous frameworks [5].

In addition, microchips with nonconcurrent frameworks can be effortlessly actualized in FPGAs. This paper displays the outline, execution, and test consequences of an offbeat 32-bit microchip actualized in a Xilinx FPGA Virtex 5 that are produced in a stage planned only for synchronous circuits. The FPGAs utilizes synchronous segments, for example, DCM (advanced clock supervisor) and DLL (delay-bolted circle) used by the product apparatuses with a specific end goal to integrate an outline. This execution can be performed by methods for a ST pipeline as an actuation flag generator obstruct, and also the hard full scale expected to create the defer time for the ST offbeat convention [14].

Table 2. Asynchronous microprocessors

| Microprocessor                                                               | Architecture                                                                | Technology                  | Performance   |  |
|------------------------------------------------------------------------------|-----------------------------------------------------------------------------|-----------------------------|---------------|--|
| Caltech (Martin, Burns, Lee, Borkovic, &<br>Hazewindus, 1989)                | 4-phase, dual rail, 5-stage pipeline, 16-bit<br>RISC.                       | 20,000 1.6 µm transistors   | 18 MIPS       |  |
| NRS (Brunvand, 1993)                                                         | 2-phase, single rail, 5-stage pipeline, 16-bit<br>RISC.                     | FPGA Actel                  | 1.3 MIPS      |  |
| AMULET1 (Furber, Day, Garside, Paver, &<br>Woods, 1994)                      | 2-phase, single rail, 5-stage pipeline, based<br>on a 32-bit ARM.           | 60,000 1.0 µm transistors   | 9k Dhrystones |  |
| TICTAC 1 (Murata, 1989)                                                      | 2-phase, dual rail, 2-step non-pipeline, 32-bit<br>RISC.                    | 22,000 1.0 µm transistors   | 11.2 MIPS     |  |
| FRED (Richardson & Brunvand, 1996)                                           | 2-phase, single rail, multifunctional pipeline,<br>based on a 16-bit 88100. | Defined in VHDL             | 120 MIPS      |  |
| 80C51 (van Gageldonk et al., 1998)                                           | 4-phase, single rail, CPU and peripherals,<br>8-bit CISC.                   | 27,4820 1.6 µm transistors  | 2.10 MIPS     |  |
| AMULET2 (Furber et al., 1999)                                                | 4-phase, single rail, forwarding pipeline,<br>based on a 32-bit ARM.        | 450,000 0.5 μm transistors  | 42 MIPS       |  |
| TICTAC 2 (Takamura et al., 1998)                                             | 2-phase, dual rail, 5-stage pipeline, based on<br>a 32-bit MIPS R 3000.     | 496,000 0.5 µm transistors  | 52.3 VAX MIPS |  |
| AMULET3 (Furber, Edwards, & Garside,<br>2000)                                | 4-phase, single rail, forwarding pipeline,<br>based on a 32-bit ARM.        | 113,000 0.35 µm transistors | 120 MIPS      |  |
| BitSNAP (Ekanayake, Nelly, & Manohar,<br>2005)                               | 4-phase, dual rail, based on 16, 32, and<br>64-bit SNAP ISAs                | 0.18 µm CMOS                | 6-54 MIPS     |  |
| NCTUAC18S (Hung-Yue, Wei-Min,<br>Yuan-Teng, Chang-Jiu, & Fu-Chiung,<br>2011) | 4-phase, dual rail, 5-stage pipeline, based on<br>and 8-bit PIC18 ISA.      | 0.13 µm TSMC                | n/a           |  |

# IV. CONCLUSION

Bit-level parallelism is generally easy to configuration contrasted with coarser grained parallelism, however, was not broadly utilized as a result of issues of misusing the methods utilizing existing innovation and apparatuses. DSP, SCOP and PGA are endeavors at disentangling the utilization of the bitlevel plan, but the full abuse can just come through joining intelligence and programmability utilizing abnormal state dialects. In any case they are just a middle of the road arrangement contrasted with the full door and bit-level outline executions. A computer with a 32-bit processor cannot have a 64-bit adaptation of a working framework introduced. It can just have a 32-bit variant of a working framework introduced.

#### REFERENCES

- C. Gay, "The MC68020, a true 32-bit microprocessor," vol. 8, no. 7, pp. 377-383, September 1984.
- [2] A. V. AnanthaLakshmi and G. F. Sudha, "A novel power efficient 0.64-GFlops fused 32-bit reversible floating point arithmetic unit architecture for digital signal processing applications," *Microprocessors and Microsystems*, vol. 51, pp. 366-385, 2017.
- [3] J. P. S. David E. Culler, Anoop Gupta, *Parallel Computer Architecture*. Morgan Kaufmann 1999, p. 15.
- [4] K. W. L. KS. Low, M.F. Rahman, "A microprocessor based fully digital AC servo drive," *Microprocessors and Microsystems 20*, pp. 429-436, 20 June 1995.
- [5] A. Pedroza de la Crúz, J. R. Reyes Barón, S. Ortega Cisneros, J. J. Raygoza Panduro, M. Á. Carrazco Díaz, and J. R. Loo Yau, "Characterization and synthesis of a 32-bit asynchronous microprocessor in synchronous reconfigurable devices," *Journal of Applied Research and Technology*, vol. 13, no. 5, pp. 483-497, 2015.
- [6] Y. F. W. a. A. H. M. S. ULA, "A High Speed Power System Transmission Line Protection Scheme Using a 32-bit Microprocessor," pp. 195-202, February 21, 1991.
- S. M. Priyavrat Bhardwaj, "Design & Simulation Of A 32-Bit Risc Based Mips Processor Using Verilog," vol. 5, no. 11, pp. 166-172, November 2016.
- [8] J. M. a. C. L. lan Hay, "TRON-compatible 1632-bit microprocessor," vol. 13, no. 9, November 1989.
- [9] C. C. Arthur Abnous, Jeffrey Gray, John Lenell, Andrew Naylor and Nader Bagherzadeh, "Design and implementation of the 'Tiny RISC' microprocessor," vol. 16, no. 4, 1992.
- [10] M. J. Flynn, "Basic issues in microprocessor architecture," *Journal of Systems Architecture*, pp. 939-948, 1999.
- [11] N. Margulis, "i860 microprocessor internal architecture," vol. 14, no. 2, pp. 89-96, March 1990.
- [12] J. S. a. F. Lombardi, "A data path approach for testing microprocessors with a fault bound: the MC68000 case," vol. 16, pp. 529-539, 1992.
- [13] "32-bit computer design using the 68020,68881 and 68851," vol. 6, pp. 345-351, 1988.
- [14] N. Tredennick, "Experiences in commercial VLSI microprocessor design," vol. 12, no. 8, pp. 419-432, October 1988.