# A Double Data Rate 8T-Cell SRAM Architecture for Systems-on-Chip

Saleh M. Abdel-Hafeez, Mohammad Shatnawi Department of Computer Engineering Jordan University of Science and Technology 22110 Irbid, Jordan, P.O.BOX 3030 sabdel@just.edu.jo

Abstract-The substantial increase in market demand for handheld devices drives the need for low-power high-speed data access SRAM for systems-on-chip (SoCs). In this paper, we present a novel low-power SRAM architectural design that provides high-noise margin double data rate (DDR) read/write accesses using a conventional 8T-Cell and a partitioned architectural structure consisting of even and odd modules (corresponding to even and odd addresses), which are accessed alternatingly. Write accesses occur at both clock edges such that the even modules are accessed at the rising edge and the odd modules are accessed at the falling edge. Similarly, the read accesses occur at both clock edges such that the even modules are assumed to be evaluated at the rising clock edge, while concurrently the odd modules are pre-charged, and vice versa. We implement a 128-bit X 64-bit SRAM with DDR accesses and an 8T-Cell structure using a standard 0.09µm/1V CMOS TSMC process. Simulation results reveal that our architecture operates with a 1GHz read/write cycle, a data throughput of 2GHz/64-bit, and an average power consumption of 23.4mW.

# Keywords—Double-Data-Rate (DDR) Memory, 8T-Cell, SRAM, System-on-Chip (SoC).

#### I. INTRODUCTION

The demand for high data throughput and low power operation SRAM designs for systems-on-chip (SoCs) is ever increasing due to the market demand for hand-held devices and applications. Since many of these applications process a tremendous amount of data, both high throughput and low power consumption are critical for the advancement and success of these applications.

Data processing throughput can be directly increased by accessing data on both the rising and falling clock edges double data rate (DDR) throughput. Commercial products leverage this increased throughput using a DDR system input/output (I/O) bus that interfaces with off-chip DDR SDRAM. However, this bus is typically only available for accessing off-chip main memory and not for transfers between on-chip memory modules, which are used for

TABLE 1: SYMBOL DEFINITIONS FOR THE ARCHITECTURAL SIGNALS

| SYMBOL     | DEFINITION    | Maximum input capacitance (Cin) in pf |  |  |
|------------|---------------|---------------------------------------|--|--|
| FFI (63:0) | Input data    | 0.01                                  |  |  |
| WCLK       | Write clock   | 0.01                                  |  |  |
| RCLK       | Read clock    | 0.01                                  |  |  |
| RDA (6:0)  | Read address  | 0.01                                  |  |  |
| WRA (6:0)  | Write address | 0.01                                  |  |  |
| FFO (63:0) | Output data   |                                       |  |  |

Ann Gordon-Ross

Department of Electrical and Computer Engineering University of Florida, Gainesville, FL 32611, USA ann@ece.ufl.edu

temporary storage and manipulation of data between arithmetic units. Since these on-chip SRAM circuits typically leverage different geometric variations, the SRAM design can be based on a wide topology of memory cell structures (e.g., [1][3][4][6][8][9]). Whereas these structures are appropriate for the recent features required by memory designs for SoC products, since the on-chip system I/O bus does not support DDR access, on-chip DDR SRAMs cannot be used. In order to support on-chip DDR memory communication, we propose a DDR SRAM architecture using a typical 8T-Cell structure [1][3][4], which is amenable to continued technology scaling [3], in Sections II and III. Section IV presents HSPICE simulation results and a comparison with related work. Section V summarizes and concludes our work.

### II. DDR SRAM DESIGN SPECIFICATIONS

In this section, we present our proposed DDR SRAM design specifications for a 128-bit (height) X 64-bit (width) SRAM using the 8T-Cell and the I/O signals denoted in TABLE 1 and illustrated in Figure 1. The design's timing constraints depend on a self-timing design methodology [2], where modifying the transistors' sizes and inserting gates are the key elements for reducing the skew timing and ensuring proper operation. For reference, TABLE 2 lists all of the abbreviated timing constraints.

Figure 2 illustrates the write timing constraints for the write address and data buses with respect to the required setup and hold time constraints for the address and data buses, and in addition, to the rising and falling edge of the write clock. The read timing constraints are similar and are omitted for brevity. Without loss of generality, even data addresses are written/read to even modules after the rising edge of *WCLK/RCLK* and odd data addresses are written/read to odd modules after the falling edge of *WCLK/RCLK*. This method allows for DDR throughput to be fully utilized during read or write operations by toggling between consecutive even and odd addresses, which imposes



Figure 1: 128-bit X 64-bit DDR SRAM and relevant signals

TABLE 2: SYMBOL DEFINITIONS FOR THE TIMING SIGNALS

| SYMBOL                             | DEFINITION                          |  |  |
|------------------------------------|-------------------------------------|--|--|
| $T_{Eas}/T_{Eah}$                  | Even module address setup/hold time |  |  |
| $T_{Eds}/T_{Edh}$                  | Even module data setup/hold time    |  |  |
| T <sub>Oas</sub> /T <sub>Oah</sub> | Odd module address setup/hold time  |  |  |
| T <sub>Ods</sub> /T <sub>Odh</sub> | Odd module data setup/hold time     |  |  |
| $T_{Wacc} = T_{Eah} = T_{Oah}$     | Write access                        |  |  |
| $T_{Eracc} = T_{Oracc}$            | Read access                         |  |  |

essentially no throughput degradation on the design's operation since data, in many applications, are usually processed in blocks of consecutive even-odd addresses.

Our proposed DDR SRAM design's core cells are implemented using the 8T-Cell depicted in Figure 3 [1][3][4]. The 8T-Cell structure provides a read mechanism that does not disturb the internal node of the cell with a high read-write noise margin, thus, the 8T-Cell is amenable to continued technology scaling with low supply voltage [4]. In addition, the 8T-Cell realizes a low power consumption sensing I/O circuit that is considered among the least power of all counterpart memory cells [4]. For brevity, we refer the reader to [1][3][4] for additional details on the 8T-Cell's advantages. The proposed sizes under 90nm technology are depicted in Figure 3 as dedicated sizes derived by several foundries [11][12] for general SRAM SoC products, which we leverage in our design's simulation.

#### III. DDR SRAM ARCHITECTURE

#### A. Modules and Decoder Structure

The SRAM DDR architecture is partitioned into two main modules of sizes 64-bits X 64-bits, which are depicted as the even and odd modules in Figure 4 and are constructed using arrays of 8T-Cells. This partitioned approach provides a regular structure with sufficient driving capabilities and reduces the skew timing variations between the cells, and thus minimizes the design iterations necessary for modifying the sizes of the buffers and logic gates.

The least most significant address bus bits  $(RDA_0 \text{ and } WRA_0)$  distinguish between the even and odd modules, while the remainder of the address bus bits  $(RDA_6 \text{ to } RDA_1 \text{ and } WRA_6 \text{ to } WRA_1)$  are evaluated in the pre-decoder module. This parallelization minimizes the even and odd decoders' switching activities and results in efficient addressing power consumption. We connect  $RDA_0$  and  $WRA_0$  to the last stage



Figure 2: Write timing constraints between the write clock and write address and data buses

of the even and odd decoders with *WCLK/RCLK* in order to preserve all timing constraints with the addition of a minimum pre-decoder and decoder gate activity.

The decoder delay structure inhibits balance delay among all of the decoder's selected output pins with respect to all of the input pins in order to maintain a constant setup and hold time with respect to the memory clock systems (*RCLK*, *WCLK*). We refer the reader to [2] for the complete selftiming decoder design.

#### B. Write Operation

Data is driven through simple CMOS inverters—input buffers—as depicted in Figure 5 (a). Every input buffer is associated with one data bit as input that generates two write bit lines (*WBL*, *WBLB*) as outputs, which are the complement of each other. The write bit lines for any particular data bit input are associated with a column array of 64 8T-Cells. Consequently, every write bit line (*WBL*, *WBLB*) is connected to 64 diffusion capacitances as shown by the vertical dashed line in Figure 6. The write bit line delay  $T_{WBL}$  can be approximated using [13]:

$$T_{WBL} = 0.35 \text{ x } R_{WBL} \text{ x } C_{WBL} \text{ x } L^2_{WBL}$$
(1)

where  $R_{WBL}$  and  $C_{WBL}$  are the distributed components of the write bit line including the two overlap diffusion capacitances between every two adjacent cells in the column. The length of the write bit lines  $L_{WBL}$  is the length of the



Figure 3: Schematic of the 8T-Cell with size 2.84µm X 0.72µm



Figure 4: Proposed partitioned DDR SRAM architecture



Figure 5: I/O sensing circuit: (a) input buffer; (b) output buffer memory column, which equals 64 x 0.72  $\mu$ m = 46.08  $\mu$ m.

The write word lines (WWL) are associated with a row array of 64 8T-Cells where each cell comprises of two gates, as shown by the horizontal line in Figure 6, The write word line delay  $T_{WWL}$  can be approximated using [13]:

$$\Gamma_{WWL} = 0.35 \text{ x } R_{WWL} \text{ x } C_{WWL} \text{ x } L^2_{WWL}$$
(2)

where  $R_{WWL}$  and  $C_{WWL}$  are the distributed parasitic components of the write word lines including the two gate capacitances per cell in 64 cells of the row. The length of the write word lines  $L_{WWL}$  is the length of the memory row, which equals 64 x 2.84 µm = 158.72µm.

Each  $WBL_k/WBLB_k$  data bit line must preserve a setup and hold time with respect to WWL<sub>k</sub> at each cell based on the timing depicted in Figure 2.

# C. Read Operation

The 8T-Cell separates the read and write operational logic and the read bit lines (*RBL*) must be pre-charged before evaluation. A stored value of 1 is considered the critical path read delay, which can be approximated using [13]:

$$T_{RACC} = 0.35 \text{ x } R_{RWL} \text{ x } C_{RWL} \text{ x } L^2_{RWL} + 0.35 \text{ x } R_{RBL} \text{ x } C_{RBL} \text{ x } L^2_{RBL}$$
(3)

where  $R_{RWL}$  and  $C_{RWL}$  are the distributed parasitic values including the gate capacitances of the cell for 64 cells of the read word lines (RWL) and length of the read word lines  $L_{RWL}$  equals 64 x 2.84 µm =181.76 µm. Alternatively,  $R_{RBL}$ and  $C_{RBL}$  are the distributed parasitic values of the read bit lines including the diffusion capacitance of the read portion per cell where the length of the read bit lines  $L_{RBL}$  equals 64 x 0.72 µm = 46.08 µm.

The read access time depicted in Equation (3) depends on the address setup and hold times and the rising/falling edge of *RCLK*. In order to ensure proper DDR read operation, the read access time  $T_{RACC}$  must be completed within a quarter of the read clock cycle  $T_{RCLK}$  (i.e.,  $T_{RACC} \leq \frac{1}{4} T_{RCLK}$ ).

Figure 5 (b) depicts the sensing circuit that multiplexes between the two modules of read bit lines  $(RBLO_k, RBLE_k)$  such that one bit line (i.e., in the even module) is pre-charged



Figure 6: Write critical paths for even and odd corner cells

while the other bit line (i.e., in the odd module) is evaluated. Each sensing circuit is associated with two columns of the 64 8T-Cell array each in opposite directions. This structure results in a total of 64 sensing circuits where each sensing circuit has two input lines,  $RBLE_k$  and  $RBLO_k$ 

The pre-charge time for  $RBLE_k$  and  $RBLO_k$  depends only on the assertion or de-assertion of RCLK, which activates the sense circuits for pre-charging the even or odd module's read bit lines. The pre-charge time occurs within one half cycle of RCLK and can be approximated using [13]:

$$T_{\text{pre-charge}} = 0.35 \text{ x } R_{\text{RBL}} \text{ x } C_{\text{RBL}} \text{ x } L^2_{\text{RBL}} \le \frac{1}{2} T_{\text{RCLK}}$$
(4)

# IV. HSPICE SIMULATIONS AND COMPARISONS

This section shows the HSPICE simulations for the write operation (for brevity, we omit similar read operation results) for our proposed DDR SRAM constructed with 8T-Cells at a size of 128-bit X 64-bit (8,192 bits of total storage) with the following specifications: a supply voltage of 1V, a temperature of 25°C, a write operating frequency of 1GHz, and 90nm TSMC CMOS technology. To fully verify and demonstrate the DDR operation, we simulated writing from the even and odd modules. We show a sample portion of the simulation consisting of the upper right corner cell for an even module and lower right corner cell for an odd module since these corner cells present the worst case delay.

Figure 7 shows the write timing simulation with respect to the toggling write address WRA63, gated clock WCLK, and input data *FFI63*. The address is propagated through the pre-decoder and is gated with  $WRA_0$  and WCLK in the decoder, thereby generating the WWLE63/WWLO63 signal, which propagates horizontally as shown by the dashed line in Figure 6 and approximated by Equation (2). The data bit line *FFI63* arrives at the upper cell from WBLE63/WBLO63 with enough setup and hold time with respect to



Figure 7: Write timing simulation @ 1 GHz at 90 nm TSMC technology

*WWLE63/WWLO63*, which ensures a valid write data operation on the even and odd modules' cells. In this case, the hold time is considered the write access time. Finally, the content of the even/odd cell is realized by the *DEVEN/DODD* signal, which shows correct DDR write operations on the rising and falling edges of *WCLK*.

TABLE 3 depicts a comparison with analogues designs [5][7][10], where the previous works' data are reported directly from the literature without any scaling. Since, to the best of our knowledge, there is no reported DDR SRAM for on-chip communication with arithmetic units, the compared designs are single data rate (SDR). Although our design's circuit structure is implemented to support DDR throughput in contrast with the compared designs' SDR throughput, our design uses the same 8T-Cell and the same decoder logic with the same timing constraints, the only difference being the multiplexed I/O sense circuit. Therefore, the majority of the 8T-Cell's advantages with respect to SDR SRAM are applicable to our DDR SRAM design, such as low power consumption, competitive silicon area, fast access, and large noise margin between read and write that support continued technology scaling. Furthermore, the proposed design provides twice the throughput at a competitive memory clock speed (1GHz) and in addition, competitive power consumption against dynamic throughput activities.

# V. CONCLUSIONS

In this paper, we presented a double data rate (DDR) SRAM design for communication between the memory modules on a system-on-chip (SoC) that is independent of the DDR input-output (I/O) system bus. Our design leverages DDR throughput for read/write access by

TABLE 3: COMPARISON WITH SIMILAR FEATURES DESIGNS

| Design | Size<br>(K-bit) | Tech      | Power<br>(mW) | Clock<br>access | Туре |
|--------|-----------------|-----------|---------------|-----------------|------|
| Ours   | 8               | 90nm/1V   | 23.4          | 1ns             | DDR  |
| [10]   | 64              | 90nm/1V   | 12.9          | 1.2ns           | SDR  |
| [7]    | 8               | 180nm/1.8 | 20.5          | 2ns             | SDR  |
| [5]    | 8               | 65nm/1V   | 10.7          | 2ns             | SDR  |

leveraging a partitioned architecture wherein the memory module is partitioned into two modules—an even and an odd module—which alternately operate on the data at rising and falling edges of the memory clock. Additionally, we architected an I/O low-power sense multiplexed circuit to facilitate the DDR read operation. Simulations verified our design's correctness with a 64-bit I/O bus at a read/write operating frequency of 1GHz with DDR throughput of 2GHz/64-bit.

#### VI. REFERENCES

- S. Abdel-hafeez and S. P. Sribhashyam, "System and Method for Efficiently Implementing a Double Data Rate Memory Architecture", US patent No. 6,356,509 B1; March 12, 2002.
- [2] S. M. Abdel-hafeez and A. S. Matalkah, "CMOS Eight-Transistors Memory Cell for Low-Dynamic-Power High-Speed Embedded SRAM," Journal of Circuits, Systems, and Computers, Vol. 17, No. 5, World Scientific Publishing Company, Jan. 22, 2009, pp. 845-863
- [3] Anandtech (Intel I7): http://www.anandtech.com/show/2594/10
- [4] L. Chang, R. K. Montoye, Y. Nakamura, K. A. Batson, R. J. Eickemeyer, R. H. Dennard, W. Haensch, and D. Jamsek, "An 8T-SRAM for variability Tolerance and Low-Voltage Operation in Highperformance CACHES,"IEEE Journal of Solid-State Circuits, Vol. 43, Issue 4, April 2008, pp. 956-963.
- [5] A. T. Do, K. S. Yeo, J. Y. S. Low, J. Y. L. Low, and Z. H. Kong, "An 8T SRAM Cell with Column-based Dynamic Supply Voltage for Bitinterleaving," Conference on Circuits and Systems (APCCAS) IEEE Asia Pacific, 2010, pp. 704-707
- [6] K. Nii, Y. Masuda, M. Yabuuchi, Y. Tsukamoto, S. Ohbayashi, S. Imaoka, M. Igarashi, K. Tomita, N. Tsuboi, H. Makino, K. Ishibashi, and H. Shinohara, "A 65nm Ultra-High-Density Dual-port SRAM with 0.71μm<sup>2</sup> 8T-cell for SoC," Symposium on VLSI Circuits Digest of Technical Papers, 2006, pp.130-131
- [7] S. Reddy G M and P. C. Reddy, "Design and Implementation of 8Kbits Low power SRAM in 180nm technology," Proceedings of the International Conference of Engineers and Computer Scientists 2009, Vol. III, IMECS 2009, March 18-20, pp. 100-105
- [8] T. Suzuki, S. Moriwaki, A. Kawasumi, S. Miyano, and H. Shinohara, "0.5-V, 150-MHz, Bulk-CMOS SRAM with Suspended Bit-Line Read Scheme," Proceedings of the ESSCIRC, 2010, pp. 354-357
- [9] T. Suzuki, H. Yamauchi, Y. Yamagami, K. Satomi, and H. Akamatsu, "A table 2-Port SRAM Cell Design Against Simultaneously Read/Write-Distributed Accesses," IEEE Journal of Solid-State Circuits, Vol. 43, No. 9, Sept. 2008, pp.2109-2119
- [10] K. Takeda, Y. Hagihara, Y. Aimoto, M. Nomura, Y. Nakazawa, T. Ishii, and H. Kobatake, "A Read-Static-Noise Margin-Free SRAM Cell for Low-VDD and High-Speed Applications," IEEE Journal of Solid-State Circuits, Vol. 41, Issue 1, January 2006, pp. 113-121
- [11] TS Taiwan Semiconductor Manufacturing Corp., "0.09 µm CMOS ASIC Process Digests," 2005.
- [12] United Microelectronics Corporation (UMC), "0.09 µm CMOS ASIC Process Digests," 2005
- [13] J. P. Uyemura, CMOS Logic Circuit Design, Kluwer, 1999