Mindset for Design Verification and Post-Silicon Lab Validation

Verification and validation of an Application-Specific Integrated Circuit (ASIC) are closely related concepts, but they differ fundamentally in their objectives and, more importantly, how they are approached.

"We do not see the things as they are, rather we see the things as we are" famous words said by Anaïs Nin (a Franco-American novelist) and they hold true here. And hence, we must change the approach when we verify the design as compared to when we validate it in the lab, even though we aim to run the same or similar test cases on the design. The process is different, so are the tools. Both activities have their limitations and challenges, and in a harmonious way they complement each other.

Functional verification focuses on getting all the logical bugs flushed out of the Register Transfer Level (RTL)design by verifying exhaustive combinations of scenarios using test case simulations, checkers, and coverage at a prefabrication abstraction level. The intent is to verify all the possible combinations of scenarios and test cases exhaustively at a block, subsystem, and system-level to ensure robust logical implementation of the chip, so there is no need for re-spin later. Exhaustive checks at each time stamp are expected as that is the stage where an engineer has full accessibility into the state of the design.

Post siliconvalidation focuses on catching any defects which could have escaped during verification phase or introduced after verification phase during silicon fabrication. The defects could've gone unnoticed during verification because of some limitations of the verification process later discussed in the article. The defect can also occur during Gate level implementation of the design or while the chip is being fabricated. Even a fully functional chip needs to be tested for reliability when integrated into the real-world system. The scenarios and test cases run at the system level of the Design Verification (DV) phase are ported to the test board for post silicon validation. This is to minimize the test development effort and to leverage existing stimulus generation setup. However, the checking mechanism can change as there are no waveforms to debug the failure, and the data checking can only be done end to end, i.e., from I/O pins.

Design Verification

Intent

Verification mindset is about breaking down a design into smaller pieces, including checking of the functioning of smaller logical blocks, working of basic paths, covering the boundary cases, covering the exhaustive combinations, getting the architectural clarity, and finding architectural as well as implementation bugs. DV is the first test of the design and the first savior because it keeps the window open for the design to adapt during the DV phase itself. During the block or subsystem level, DV's intent is to catch/expose the basic bugs in design early, which the full chip design simulation would expose on the n*100th iteration (or may not even catch it at all). Since block level DV can be done by creating multiple low level testbenches in parallel for each block, it can catch design defects faster i.e., system initialization is not required while verifying individual blocks, this way a lot of time and iterations are saved. This flexibility not only saves time but also gives confidence on the functional behavior of each individual block.[PP10][KR11] This activity is necessary but contradictory to the way the design is going to be used in the real world. Verification is more about verifying the design for what it is without thinking much about its use case.

Approach

The DV approach starts with bringing up test cases of each functionality at the block level, having Universal Verification Components (UVCs) hooked up/integrated at each abstraction level - giving full control of the driving, and monitoring the design like a white box as shown in the diagram below.
This is accompanied by writing checks, implementing a scoreboard, and having assertions in place for every feature.
Once a robust and a smart testbench is in place, the design is tested for its multiple features, constraint random, stressful, jumbo test cases.
The design is then advanced to higher-level subsystem and system-level test cases, which take longer to execute but more closely reflect real-world traffic conditions.
This is followed by code coverage and functional coverage closure for every feature.

Fig 1 Verification System Diagram

Challenges and Limitations

Amount of time to run system-level simulations
Not real (just a model)

It may raise some eyebrows to suggest that lab validation helps catch bugs that escaped the verification cycle -because, ideally, none should. Verification engineers apply rigorous due diligence and strict sign-off metrics before approving DV, as any lapse can result in a costly re-spin, or worse, a total product failure. But we must factor in that as the design and TB environment get bulkier; the system's simulation time shoots too. For a typical ASIC, executing 5000 write/read accesses at the system level using serial interfaces such as Serial Peripheral Interface (SPI) may consume only about 5 ms of simulated time, yet can take more than two days to run in simulation. In contrast, the actual ASIC would have completed billions of write/read operations in that same period. These longer iterations are vital to perform before sending the ASIC to the market because some bugs can only be revealed when run with high iterations back-to-back. An ASIC design, even if rigorously stress-tested at the block level, cannot be considered free of bugs until it is validated as a complete system using a volume of stimulus vectors that reflects the real-world application conditions. Modern day ASICs are too convoluted to accommodate tremendously big iterations at a simulation level. Performing those regressions in the lab is important and also quite easy and less time-consuming.

As an ASIC incorporates more features, it requires a greater number of VIPs, multiple verification tools, and increasingly complex toolchains-often sourced from different vendors-to verify them effectively. Though it is not a bad practice, it involves more infrastructure and licensing limitations [e.g., VIP licenses, analog sim licenses]. For example, an ASIC design with analog mixed signal components along with digital modules require separate Analog Mixed Signal (AMS) sim licenses when run with full chip simulations though 70% of chip components might be digital. These limitations do not stop verifying a design on full chip simulations but limit the iterations done on them because of time and infrastructure constraints discussed above.

Ultimately, there is a fundamental limitation that comes with a simulation-that it is not real. No matter how robust the system modelling is, no matter how proven an IP or a VIP is, until the design is not proven on the hardware, it remains unproven. With an ever-evolving market, IPs and VIPs mature over a period of time, both logically and in terms of timing parameters.

Lab Validation

Intent

Lab validation is performed after the ASIC is back from the foundry. Now the only way to drive the stimulus and monitor the system's state is through the top level I/O pins. The only way to test the design is to first get it to the known state. The intent of lab validation is to understand how reliable the chip functionalities are, and to know how harmonious its behavior is when integrated into a real system.

Approach

DV system level tests are ported to Hardware Abstraction Layer (HAL) compliant tests and kept ready while the ASIC is in production.
Software/HAL layer infrastructure is developed that interacts with the ASIC top level I/O (black box) once it is ready to be soldered on the board/assembly. Once the ASIC is available, connectivity checks and Design for Testability (DFT) screening are performed to confirm that the device and its corresponding part number are alive and functioning correctly.
This is followed by running functional bring-up test cases to verify that each feature is alive and operating as expected
This is followed by multi-functionality test cases, actual use cases, and stress cases.

Fig2 Lab Validation System Diagram

Challenges and Limitations

Traditionally lab validation comes with more challenges than DV at simulation phase:

Integration
Debugging
Reference model
Known initial state
Development or test porting

ASIC initialization is the first roadblock and the first real test of the chip to give a sense that it is alive and breathing. It is also the first checkpoint to know if there is no basic requirement level discrepancy in the design. To begin lab validation, it is necessary that the system is in a known state and primed to accept the stimulus. For example, since there are no modelled 'hX in the memories or FIFOs, they are filled with garbage values. If the system is not taken into a known state before launching use cases, it may lead to random results, and there would be no way to understand what is causing those results.

Example 1: Integration issue - For a recent pin electronics ASIC validation, the Field Programmable Gate Array (FPGA) specifications had certain guidelines around the Serdes protocol, but no rules were laid out. The interface was clean at the simulation level but during validation the Serdes PLL lock was so unstable that the link would always result in UNCORR errors on sending a packet. This led to many iterations for Serdes to bring up between ASIC and FPGA, though there were no logical issues.

Another typical integration issue faced was that the physical Serdes lane mapping was different/jumbled for the FPGA driving the ASIC on the actual test board. When we verify the Register Transfer Level (RTL) design of the FPGA connected to the ASIC, the physical lane ids of the FPGA are not decided from the beginning. During the RTL design phase, only logical lane numbering is implemented. Physical lane ids are finalized much later after the ASIC design is finalized and synthesized. There may be various reasons for this to happen; one of the reasons is that the board layout can vary depending on the application. Since physical lane numbering has no functional impact during design verification phase, we rarely care about what the actual connections/pin out on the test assembly would be and verify the system with logical lane mappings only. But during validation, it becomes essential to map the logical lane ids to the physical lane ids correctly. With various lanes to map, it takes many iterations of debugging to finalize the correct and desired lane mapping between FPGA and the ASIC. It takes debug of the status registers of each lane on the ASIC receiving the traffic with that of the FPGA lane that is driving the traffic to figure out the mapping. And not only correct logical to physical lane id mapping, but it also takes correct FPGA programming to run expected traffic on each physical lane id of the ASIC.

Another challenge is how to compare the results during lab validation. One of the possible ways is to port the test cases with the same stimulus generation performed at the simulation level and compare the values with that of the lab results. Here, the simulation results serve as the expected system value. This approach trusts and leverages the existing test bench (simulation) generated values, saving further development effort. But this approach has a limitation when tests are run with a high amount of unique and random stimulus. Launching millions or billions of random stimuluses on ASIC is less time consuming, but its corresponding simulation time increases exponentially, which generates the expected values of the system variables for comparison and data checking. One of the compromised solutions to this issue can be launching repeated patterns of pseudo random sets of stimuli for which system status can be manually calculated and compared at the end of the test rather than using simulation generated values for expectation.

The intrinsic nature of lab validation has the biggest challenge of debugging failures. The issue can vary from hardware initialization, software initialization, connections, incorrect stimulus driven on lab, or correct/intended stimulus driven in an incorrect order in the lab. The software interface latency comes into play as well. There are no waveforms to help us understand the state of the system at any timestamp. The validation relies on the status registers, counters, errors, and redundancy check registers to show the state of the system at any point of time. Though some designs can have Dynamic Random-Access Memories (DRAMs) or buffers implemented, which continuously latch the state of certain system variables on regular timestamps. The design may incorporate signature generator registers that update their values at each timestamp according to an algorithmic relationship with system inputs and outputs. These provisions make the design bulkier and are redundant in terms of end-user functional value, but they are instrumental in debugging the failures during lab validation.

A typical ASIC may require only 5-10% of the testing effort compared to design verification; however, post-silicon tests provide significantly deeper insight on the stability/robustness of the ASIC.

Conclusion

It is notable that even when the same test case-that is, the same stimulus- are used at both the simulation and lab validation stages; the approaches to these activities are fundamentally different. During verification, the design is still evolving in terms of specifications and architecture development, whereas after silicon fabrication, the use cases are well-defined, and reliability of the design and the chip is under scrutiny. Both activities design verification and lab validation sit at the heart of the ASIC production cycle. It is not practical to get a well-functioning ASIC fabricated without a thorough verification of the design. And the verification itself can only be authenticated when the ASIC can prove its reliability during post silicon lab validation.

Author Details:

Krushna Rajani is an ASIC Verification Technical Lead at eInfochips, an Arrow company. He specializes in the functional verification of ASIC, FPGA, and SoC designs, ensuring bug-free RTL delivery through efficient testbench development techniques and comprehensive test scenario creation for DV simulations. He has nearly nine years of experience in the functional design verification of various RTL designs.

He has worked on VIP development for protocols such as PCIe and OCP. He also has hands-on experience with gate-level simulations and the lab validation of a pin electronics ASIC. In his spare time, he mentors junior engineers and works on testbench optimization techniques using AI.

LinkedIn: https://www.linkedin.com/in/krushna-rajani-40350ba9/

Design and Engineering

Disclaimer

This content is a community contribution. The views and data expressed are solely those of the author and do not reflect the official position or endorsement of nasscom.

That the contents of third-party articles/blogs published here on the website, and the interpretation of all information in the article/blogs such as data, maps, numbers, opinions etc. displayed in the article/blogs and views or the opinions expressed within the content are solely of the author's; and do not reflect the opinions and beliefs of NASSCOM or its affiliates in any manner. NASSCOM does not take any liability w.r.t. content in any manner and will not be liable in any manner whatsoever for any kind of liability arising out of any act, error or omission. The contents of third-party article/blogs published, are provided solely as convenience; and the presence of these articles/blogs should not, under any circumstances, be considered as an endorsement of the contents by NASSCOM in any manner; and if you chose to access these articles/blogs , you do so at your own risk.

eInfochips, an Arrow company, is a leading global provider of product engineering and semiconductor design services. With over 500+ products developed and 40M deployments in 140 countries, eInfochips continues to fuel technological innovations in multiple verticals. The company's service offerings include digital transformation and connected IoT solutions across various cloud platforms, including AWS and Azure. Visit- https://www.einfochips.com/