Move VM Vulnerability: Network Shutdown and Potential Hard Fork in Sui, Aptos, and Public Blockchain

BEOSIN

Jun 23, 2023

[Xangle Digest]

※ This article contains content originally published by a third party. Please refer to the bottom of the article for the copyright notice regarding this content.

Background

Move is a new blockchain programming language used by platforms such as Aptos and Sui. Recently, Beosin security research team discovered a stack overflow vulnerability caused by recursive calls. This vulnerability can lead to a total network shutdown, prevent new validators from joining the network, and potentially result in a hard fork.

Upon discovering and verifying this vulnerability, we immediately (on May 30, 2023) contacted the Sui team via email. Following their advice, we submitted the vulnerability to the Immunefi bug bounty platform on June 2, 2023. However, the official team responded that they had internally identified the issue a month ago and had been working on a private security fix. They released the fix on the same day we submitted it to Immunefi (June 2, 2023). We understand and respect their response.

The vulnerability has been fixed in the current version, so we are now publicly disclosing our research findings.

Knowledge Basics

Move virtual machine is implemented in the Rust programming language. The main unit of organization and distribution of Move code is a Package. A Package consists of a set of modules, which are defined in separate files with the extension .move. These files include Move functions and type definitions.

The minimum package directory structure is shown below, which includes a manifest file, a lock file, and a sources subdirectory containing one or more module files.

Packages can be published on the blockchain. A Package can contain multiple Modules, and a Module can contain multiple functions and structs.

Function parameters can be structs, and structs can be nested within other structs, as shown below:

In the Rust programming language, when making recursive function calls without limiting the depth of the calls, it can lead to stack overflow or depletion of CPU and memory resources. The Move virtual machine is implemented in the Rust language.

Vulnerability Description

Within the Move virtual machine, recursive functions are frequently used to handle various structured data, such as serialized data, nested structs, nested arrays, and generic nesting. To prevent stack overflow caused by recursive calls, it is necessary to check the depth of recursive calls.

The image above shows the depth of parsing for the Move virtual machine limiting simple and complex type structures.

The image above shows the depth limitation of the SIGNATURE_TOKEN within the Move virtual machine bytecode.

Although the Move virtual machine has recursive call depth checks in many places, there are still certain cases that have not been taken into account.

Let's consider an attack scenario: defining a struct A, then nesting struct B within A, and nesting struct C within B, and so on, continuing the nesting indefinitely. If the Move virtual machine uses a recursive function to handle this nesting relationship, it will crash due to stack overflow or insufficient resources. Although Move has limitations on the number of structs that can be defined within each module, we can create an unlimited number of modules.

This gives us an attack strategy:

Generate 25 packages (can be more than 25), each containing 1 module.
Each module defines 64 structs (can be more than 64 in Aptos) with a chained nesting relationship. The first struct in each module nests the last struct from the previous module.
Each module includes a callable entry function. This function takes a parameter of the type of the last struct (the 64th struct) from the previous module. The function creates and returns an instance of the last struct in the current module.
Publish each package in order.
Call the entry function in each module in order.

During our testing on Sui mainnet_v1.1.1_, we observed the following phenomena in our test environment with 4 validators:

After running the PoC once, all 4 validators immediately crash due to stack overflow.
After at least 3 validators crash and restart, all full nodes crash.
After at least 3 validators crash and restart, new validators joining the network crash at least once.
After at least 3 validators crash and restart, new full nodes joining the network sometimes crash once.
If lucky, certain validators or full nodes cannot be restarted after a crash unless all local databases are deleted.

Regarding Sui mainnet_v1.2.0, we observed the following phenomena in our test environment with 4 validators:

After running PoC once, at least 1 validator crashes due to stack overflow or out of memory.
Running the PoC again can make the second validator crash. After that, the entire network cannot accept new transactions.
Crashed validators may be unable to restart. Deleting all local databases of the crashed validator and running it again would result in a crash after some time, and it cannot be restarted anymore.
When a new validator joins the network, it crashes.

We conducted a simple test on Aptos and found that Aptos also crashes.

PoC

Sui PoC

For each created module, it is published to the Sui chain and the "mint" function is called to obtain the created "object." The "object" is then passed as a parameter to the "mint" function of the next module until the Sui node crashes.

Aptos PoC

For each created module, it is published to the Aptos chain and the "mint" function is called until the Aptos node crashes.

Vulnerability Fix

Sui mainnet_v1.2.1 (June 2, 2023), Aptos mainnet_v1.4.3 (June 3, 2023), and Move-language versions released after June 10, 2023 have addressed this vulnerability.

Sui patch:

https://github.com/MystenLabs/sui/commit/8b681515c0cf435df2a54198a28ab4ef574d202b

The patch code imposes limitations on the depth of type references in the creation of structs, vectors, and generics. The key function added is "check_depth_of_type."

Aptos patch:

https://github.com/aptos-labs/aptos-core/commit/47a0391c612407fe0b1051ef658a29e35d986963

Similar to Sui, the patch code also imposes limitations on the depth of type references in the creation of structs, vectors, and generics. The key function added is "check_depth_of_type."

Move-language patch:

https://github.com/move-language/move/commit/8f5303a365cf9da7554f8f18c393b3d6eb4867f2

Similar to Sui and Aptos, the patch code also imposes limitations on the depth of type references in the creation of structs, vectors, and generics. The key function added is "check_depth_of_type."

Vulnerability Impact

This vulnerability exploit is very simple and consumes a very small amount of gas per attack. However, its impact is significant and can lead to a total network shutdown, prevent new validator nodes from joining the network, and potentially cause a hard fork. This vulnerability affects Sui mainnet_v1.2.1, Aptos mainnet_v1.4.3, and versions of Move-language prior to June 10th.

Why can this vulnerability potentially cause a hard fork?

Malicious attackers can create struct nesting relationships of arbitrary depth and deploy these malicious structs on the blockchain. They can then send immutable malicious transactions targeting these structs. Although this process may cause network crashes, some malicious transactions will still be deployed on the chain.
To patch this vulnerability, we can limit the depth of recursive calls. However, this means that we can no longer reference the malicious structs already deployed on the blockchain and cannot verify historical transactions related to these malicious structs within the virtual machine. Only a hard fork can resolve this issue.
Due to the severe impact of hard fork testing on the current network, we have abandoned that test. However, theoretically, we believe it is feasible.

Summary

A simple recursive function call leading to a stack overflow can cause a total network shutdown, and with additional manipulation, it may even result in a hard fork. Therefore, the security of the blockchain should always be the top priority. We recommend project teams to pay close attention to such vulnerabilities and consider engaging professional blockchain security organizations for comprehensive audits.

-> Click here to read the full report.

Disclaimer

I confirm that I have read and understood the following: The information contained in this article is strictly the opinions of the author(s). This article was authored free from any form of coercion or undue influence. The content represents the author's own views and does not represent the official position or opinions of CrossAngle. This article is intended for informational purposes only and should not be construed as investment advice or solicitation. Unless otherwise specified, all users are solely responsible and liable for their own decisions about investments, investment strategies, or the use of products or services. Investment decisions should be made based on the user’s personal investment objectives, circumstances, and financial situation. Please consult a professional financial advisor for more information and guidance. Past returns or projections do not guarantee future results.

Xangle or its affiliated partners own all copyrights of the written or otherwise produced materials and content provided on the platform. Any illegal reproduction of such content, including, but not limited to, unauthorized editing, copying, reprinting, or redistribution will result in immediate legal actions without prior notice.