5 million flips

It was a challenging week working on storage access checks for slots, but it is over, and I’m quite happy with how things are looking right now. Some extra refactoring also allowed running tests under Miri and spotted some things that violate the Rust safety rules.

The work from the previous week continued on reworking the way slots are managed by native execution environment to correctly handle recursive method calls and potential access violations. It finally concluded in PR 61 with some follow-up fixes in later PRs.

There were several challenges with it that step from the desire to achieve high performance, while retaining efficiency and maintainability. In the end, the following rules were established: a single recursive call can modify storage, but multiple calls dispatched at once (meant to be parallel, but aren’t right now) have only read-only view. This should fit the expected use cases nicely and help to constrain code complexity. There are a few paragraphs that explain goals and results in more detail in PR 61 if you’re interested to learn more.

I’ve been thinking about address formats some more and decided that for a global system 44 bits for addresses is really not enough, and it should be way more than that. The addresses were also stored as [u8; 8] instead of u64 to reduce alignment requirements for data structures that might contain them, so the question became what should bigger address look like and how much bigger should it really be. I then looked at RISC-V (planned to be used for a VM) assembly for different operations on byte arrays. Comparing two addresses is the most common operation here, and turned out that byte arrays comparison generates way more assembly instructions to do the same job. This is both due to RISC nature of the ISA and the fact that alignment of the byte array is 1. x86-64 has powerful instructions to read unaligned byte ranges into XMM registers and do comparison for all bytes at once, while RISC-V assembly (at least the way it is generated by rustc for riscv64imac-unknown-none-elf) was comparing bytes one pair at a time.

As the result, I decided that u128 will be the address format, which might be relaxed to a pair of u64s that 64-bit to reduce alignment requirement from 16 bytes to 8 (RISC-V assembly is comparing 64-bit halves separately rather than full 128-bit value at once anyway). This, landed in PR 63, which also included some refactoring for slots management, given how large a pair of addresses (owner+contract are used to identify a slot) have become.

Based on developer interview with Shamil I have clarified and expanded on documentation in PR 64, which I hope will make it easier to understand.

I did some initial benchmarks with PR 61, turned out it is possible to create an environment instance and call Flipper::flip on it about four million per second on a single CPU core, which gives you a good perspective of how much overhead is happening in typical blockchain environments that can only do orders of magnitude fewer simple transactions per seconds. But after slot optimizations in PR 64 I got curious if it is possible to do better and squeezed another million calls per second in PR 65.

Five million calls per second on a single CPU core, ~200 ns per call! I’m sure it is possible to get even lower while preserving necessary logic and overall architecture. That is basically the baseline, whatever cost above that is a waste and should be minimized. perf stats look something like this:

          1 122,47 msec task-clock:u                     #    1,000 CPUs utilized             
                 0      context-switches:u               #    0,000 /sec                      
                 0      cpu-migrations:u                 #    0,000 /sec                      
               167      page-faults:u                    #  148,780 /sec                      
     5 459 682 279      cycles:u                         #    4,864 GHz                       
        74 201 852      stalled-cycles-frontend:u        #    1,36% frontend cycles idle      
    14 406 036 797      instructions:u                   #    2,64  insn per cycle            
                                                  #    0,01  stalled cycles per insn   
     2 470 197 650      branches:u                       #    2,201 G/sec                     
            14 684      branch-misses:u                  #    0,00% of all branches

With storage taken care of for now, there was a small problem that bothered me for a while: inability to run tests under Miri. Writing unsafe code in Rust is more challenging than in languages like C, and there is quite a bit of unsafe code due to FFI and performance reasons in the native execution environment right now. So running under Miri was very desirable, but unfortunately not possible with inventory crate that was used to make execution environment aware of all the contracts available, so implicit use of inventory had to go away.

I still wanted to have an ergonomic API though, and that proved to be its own challenge due to the need to register both contracts themselves and traits that they implement, but traits as such aren’t types. The best thing I came up with was to instead use dyn ContractTrait as a type, but then I discovered that associated constants just like other generics make traits not object safe. I found several discussions and summarized the conclusion with some links on Rust forum. And shared in the next post an unstable (and incomplete!) feature that allows to have associated constants in traits that are object safe, but it doesn’t look likely that it’ll be stabilized any time soon. Ultimately, I had to split associated constants into a separate trait (implemented on dyn ContractTrait) and remove : Contract bound on the ContractTrait itself, but it seemed like a price worth paying. In the end, PR 66 landed a decent API that explicitly registers contracts to be used in the native execution environment (system contracts are registered internally automatically), looks something like this:

#[test]
fn basic() {
    let shard_index = ShardIndex::from_u32(1).unwrap();
    let mut executor = NativeExecutor::in_memory_empty(shard_index)
        .with_contract::<Flipper>()
        .build()
        .unwrap();

    // ...
}

That also meant tests are finally running under Miri 😱

Yeah, Miri wasn’t too happy initially 😅. It took a lot more reading and some help from the Rust community to figure out why, but eventually I was able to make it work in PR 67, which also added Miri tests to CI 😊.

That was the bulk of the things I got done, with some random research and WIP stuff in a local branch that I will talk about next time. Unfortunately, there were no interviews this week, but hopefully next time!

Upcoming plans
#

The next steps related to execution environment will be to add a notion of a transaction. So far it was just calling methods on contracts, but the actual blockchain will have inputs serialized into a transaction. While serialization/deserialization is already happening when doing calls from contract methods, the API that developers can use externally wasn’t that. With transaction support and more explicit slots handling (ability to provide them as input and extract afterward for persistence) the workflow will be partially complete and sufficient for further interation into a bigger system with things like transaction pool. Transaction pool, of course, doesn’t exist yet (just like most other things), but it can be fixed 😉.

Based on developer feedback, I would also like to simplify contract API a bit, specifically remove #[result] and make it a special case of #[output], which will remove some code duplication in execution environment and procedural macro and will be easier to explain.

Once those are done, I will probably conduct more developer interviews with more people (will try to hunt down some ink! maintainers or users initially). If there is someone I should definitely talk to, let me know.

Also, hopefully more hiring interviews this time.

See you in about a week with more updates!

Upcoming plans #

Upcoming plans
#