First published: Thu Jan 18 2024(Updated: )
### Summary `concat` built-in can write over the bounds of the memory buffer that was allocated for it and thus overwrite existing valid data. The root cause is that the `build_IR` for `concat` doesn't properly adhere to the API of copy functions (for `>=0.3.2` the `copy_bytes` function). A contract search was performed and no vulnerable contracts were found in production. Tracked in issue https://github.com/vyperlang/vyper/issues/3737 ### Details The `build_IR` allocates a new internal variable for the concatenation: https://github.com/vyperlang/vyper/blob/3b310d5292c4d1448e673d7b3adb223f9353260e/vyper/builtins/functions.py#L534-L550 Notice that the buffer is allocated for the `maxlen` + 1 word to actually hold the length of the array. Later the `copy_bytes` function is used to copy the actual source arguments to the destination: https://github.com/vyperlang/vyper/blob/3b310d5292c4d1448e673d7b3adb223f9353260e/vyper/builtins/functions.py#L569-L572 The `dst_data` is defined via: - `data ptr` - to skip the 1 word that holds the length - `offset` - to skip the source arguments that were already written to the buffer - the `offset` is increased via: `["set", ofst, ["add", ofst, arglen]]`, ie it is increased by the length of the source argument Now, the `copy_bytes` function has multiple control flow paths, the following ones are of interest: 1) https://github.com/vyperlang/vyper/blob/3b310d5292c4d1448e673d7b3adb223f9353260e/vyper/codegen/core.py#L270-L273 2) https://github.com/vyperlang/vyper/blob/3b310d5292c4d1448e673d7b3adb223f9353260e/vyper/codegen/core.py#L301-L320 Note that the function itself contains the following note: https://github.com/vyperlang/vyper/blob/3b310d5292c4d1448e673d7b3adb223f9353260e/vyper/codegen/core.py#L245-L247 That is we can ask for a copy of `1B` yet a whole word is copied. Consider the first interesting path - if the `dst_data`'s distance to the end of the concat data buffer is `< 32B`, the `copy_op = STORE(dst, LOAD(src))` from `copy_bytes` will result in buffer overflow as it essentially will `mstore` to `dst_data` the `mload` of the source (mload will load whole word and the distance of the `dst_data` to the word boundary is `<32B`). From the two mentioned paths in `copy_bytes` it can be seen that both sources from memory and storage can cause the corruption. ### PoC The main attack vector that was found was when the `concat` is inside an `internal` function. Suppose we have an `external` function that calls `internal` one. In such case the address space is divided such that the memory for the internal function is in _lower_ portion of the adr space. As such the buffer overflow can overwrite _valid_ data of the caller. Here is a simple example: ```python #@version ^0.3.9 @internal def bar() -> uint256: sss: String[2] = concat("a", "b") return 1 @external def foo() -> int256: a: int256 = -1 b: uint256 = self.bar() return a ``` `foo` should clearly return `-1`, but it returns `452312848583266388373324160190187140051835877600158453279131187530910662655` `-1` was used intentionally due to its bit structure but the value here is fairly irelevant. In this example during the second iteration of the for loop in the `build_IR` `mload` to `dst+1` will be executed (because len('a') == 1), thus the function will write `1B` over the bounds of the buffer. The string 'b' is stored such that its right-most byte is a zero byte. So a zero byte will be written over the bounds. So when `-1` is considered it's left-most B will be overwritten to all 0. Therefore it can be seen: `452312848583266388373324160190187140051835877600158453279131187530910662655 == (2**248-1)` will output `True`. #### IR If we look at the contract's IR (vyper --no optimize -f ir), we see: ``` # Line 30 /* a: int256 = -1 */ [mstore, 320, -1 <-1>], ``` And for the second iteration of the loop in concat: ``` len, [mload, arg], [seq, [with, src, [add, arg, 32], [with, dst, [add, [add, 256 <concat destination>, 32], concat_ofst], [mstore, dst, [mload, src]]]], [set, concat_ofst, [add, concat_ofst, len]]]]], [mstore, 256 <concat destination>, concat_ofst], 256 <concat destination>]], ``` So the address of the `int` is 320. The `dst` is defined as: `[add, [add, 256 <concat destination>, 32], concat_ofst],`. In the second iteration the `concat_ofst` will be 1 because `len('a)==1` so `256+32+1 = 289`. Now this address will be `mstored` to - so the last mstored B will have the address `289+32=320` which clearly overlaps with the address of the `int a`. #### PoC 2 Due to how `immutables` are handled, they can be corrupted too: ```python #@version ^0.3.9 i: immutable(int256) @external def __init__(): i = -1 s: String[2] = concat("a", "b") @external def foo() -> int256: return i ``` Output of calling `foo()` = `452312848583266388373324160190187140051835877600158453279131187530910662655`. ### Impact The buffer overflow can result in the change of semantics of the contract. The overflow is length-dependent and thus it might go unnoticed during contract testing. However, certainly not all usages of `concat` will result in overwritten valid data as we require it to be in an `internal` function and close to the `return` statement where other memory allocations don't occur. ### Concluding remarks The bug based on the fast path in `copy_bytes` was likely introduced in: `548d35d720fb6fd8efbdc0ce525bed259a73f0b9`. `git bisect` was used between v0.3.1 and v0.3.2, `forge test` was run and the test asserted that the function indeed returns -1. For the general case, `0.3.0` and `0.3.1` are also affected.
Credit: security-advisories@github.com security-advisories@github.com
Affected Software | Affected Version | How to fix |
---|---|---|
Vyperlang Vyper | <=0.3.10 | |
pip/vyper | >=0.3.0<=0.3.10 | 0.4.0 |
Sign up to SecAlerts for real-time vulnerability data matched to your software, aggregated from hundreds of sources.