I am a big believer in Thorchain and tell anyone who will listen that it is the biggest and best thing to happen in crypto since Ethereum. My personal favourite use case is earning bitcoin on bitcoin (and rune on rune) returns in a decentralized and censorship resistant system as a liquidity provider. I also think Thorchain’s ability to make the majority of all crypto trades decentralized and censorship resistant is world changing.
I assigned 2 of our Engineers to write a feasibility report covering the following areas as they relate to operating a thornode:
- Infrastructure costs.
- Could Software Engineers complete all technical work or would a blockchain specialist need to be hired?
- How many billable hours over what time frame are required to set up a node and have it running stably?
- How many billable hours per month are required for ongoing maintenance?
- Could the node be set up and run by people other than the owner without the owner’s bond and rewards being at risk of theft?
- What bond is needed, now and future predictions.
- What rewards are generated, now and future predictions.
The result below is a report by Thorchain technical outsiders whose job it was to caste a critical eye on the project. If you find anything noteworthy please let us know in the comments below.
Running a thornode requires running a kubernetes cluster in a cloud provider such as AWS or DigitalOcean. The thornode maintainers provide scripts for provisioning the kubernetes cluster and other resources. The following estimates are based on the default resources provisioned by these scripts.
In Table 1 we give the costs of running a node for the production network, aka chaosnet, and for the test network in USD. Running a node on the test network would have the advantage that one would be able to test out operations before applying them on the production node. However given the infrastructure and labour costs maintaining a testnet node it is probably not worth it. Instead one could provision a testnet node temporarily when needed and tear it down afterwards.
Recommended cloud provider
Despite AWS being more expensive, we recommend using it over DigitalOcean for the following reasons:
- AWS provide much better monitoring and alerting services which are important for ensuring high availability of the node.
- As shown in Table 2 below, the majority of node operators use AWS so the infrastructure configuration scripts are likely to be better maintained for AWS. We encountered issues deploying to DigitalOcean because the scripts hadn’t been updated recently.
- AWS is likely to deliver greater uptime and be more responsive to issues because they are a bigger company with more resources.
- Running a node in the same AWS region as the other nodes will reduce the chance of networking issues.
Human resources and expertise
Q: Could Software Engineers complete all technical work or would a blockchain specialist need to be hired?
Deploying and maintaining a thornode requires expertise in the following areas:
- AWS or the chosen cloud provider.
Terraform is similar to other configuration-as-code systems such as AWS CloudFormation and Ansible, so could be easily learnt by Engineers comfortable with those.
Ideally node operators should also review the changes being made to the thornode code to try and prevent bugs and security vulnerabilities from being introduced to the software. This would require a level of understanding of:
- Go programming language.
- Blockchain technology.
- Consensus algorithms.
- Thornode protocol.
If one were to consider this aspect of running a node to be critical one would either need to devote the resources for Engineers to become skilled in these areas or hire someone with this expertise already.
A thornode must be available 100% of the time for the node to avoid being penalised by the network and potentially losing rewards and bonded RUNE. Kubernetes clusters are designed to be highly available but there are still scenarios where node operators need to respond quickly, such as in response to network attacks. Ideally this would mean Engineers are on-call to respond to emergencies.
Billable hours – set up
Q: How many billable hours over what time frame are required to set up a node and have it running stably?
Since the thornode developers provide scripts for deploying a node, it would only take a few days to actually deploy a node to chaosnet. However it would be extremely prudent to spend time practicing various operations in testnet such as:
- Deploying a thornode.
- Bonding and leaving the network.
- Upgrading the thornode.
- Setting up and becoming familiar with monitoring services.
- Becoming familiar with the logs produced by the various components of a thornode.
- Interacting with nodes of the different chains that comprise a thornode (bitcoin, ethereum, etc).
- Performing a manual return of yggdrasil funds in case of emergency.
The thornode testnet is extremely unreliable at the moment so some of these operations such as bonding and leaving could potentially take weeks to succeed. The actual amount of billable hours for 2 Engineers would be on the order of 76 hours but the amount of time before they could put a thornode into production could be a month or more given the instability of the Thorchain testnet.
Billable hours – Ongoing maintenance
Q: How many billable hours per month are required for ongoing maintenance?
Ongoing maintenance of the thornode is likely to be dominated by deploying updates to the thornode applications. Based upon the number of updates in the last 2 months, we can anticipate there will be around 10 updates a month. We assume each update requires 3 hours to deploy and monitor.
Engineers will also need to apply security updates to the kubernetes worker nodes around once a month and upgrade kubernetes itself once a year. Engineers maintaining the node will need to stay abreast of plans and developments that will impact Thorchain by monitoring the discord.
Security risk for node owners
Q: Could the node be set up and run by people other than the owner without the owner’s bond and rewards being at risk of theft?
No. If someone other than the owner runs the node they could steal the yggdrasil vault funds resulting in up to a 37% bond slash, currently around $2M. The incentive for the yggdrasil vault thief is currently around $1M worth of crypto.
The other people in this case would be Engineers either hired directly by the node owner to run the node, or through node-as-a-service providers like Allnodes who run nodes on the owners behalf for a fee.
The thornode protocol only allows withdrawing the bond and the reward to the wallet that provided the bond. This means that Engineers maintaining the node cannot directly steal the bond or rewards.
However the Engineers do have access to the private keys of the yggdrasil vaults which are hot wallets holding bitcoin, ethereum, etc. This gives the Engineers the ability to steal from these wallets. If this were to happen the network would punish the node owner by deducting 1.5x the amount stolen from the bond.
The yggdrasil vaults currently hold around 13% of the amount bonded so if an Engineer was to steal from the yggdrasil vaults it would result in around 20% being lost from the node owners bond. However the yggdrasil vaults can hold up to 25% of the bond which if stolen would result in the owner losing 37% of their bond, currently around $2M.
Thorchain advocates espouse the networks security incentives saying things like
“The node operator would not steal from the yggdrasil vaults because their bond would be slashed by more than the amount they stole.”
From a node’s point of view this breaks down when someone other than the bond owner has access to the yggdrasil vaults. However the security incentive holds up for the network as a whole; the network funds aren’t at risk because the stolen yggdrasil vaults funds will be replaced from the rogue node’s bond. It is only the bond owner who loses, which correctly gives the bond owner strong incentive to secure the yggdrasil vaults and the node as a whole.
Other staked/bonded reward earning nodes such as Dash Masternodes and Ethereum Validators do not have this problem. They can be run by node-as-a-service providers such as Allnodes in a trustless way where the Engineers cannot steal the node owner’s funds.
Since Thorchain plans to continue adding more features and asset pools it will soon be very difficult for a single individual to be the sole owner and maintainer of a thornode. However due to this vulnerability the owner cannot hire Engineers directly or via node-as-a-service providers to help run the node without risking massive loss.
In the current system the only options are to:
- Bond and operate the node yourself as an individual.
- Go into partnership with others whereby you all share the bonding and the running of the node and have equal incentives. This could be done manually or via a new custom smart contract that would have to be made, reviewed and tested.
- Trust Engineers and accept the risk of up to a $2M bond slash with a $1M incentive for the thief.
Any protocol level solution to improve this vulnerability would require very large changes in the Thorchain protocol that are not on the roadmap.
A potential improvement floated by Shapeshift Chief Information Security Officer Michael Perklin here is what he calls “sharding the validator set”. We would refine that proposal to instead have pool specific thornodes. For example have BTC/RUNE nodes that only have to run a bitcoin thornode, and ETH/RUNE nodes that only have to run an ethereum thornode. Rather than the current situation where a thornode has to run a full node for every supported asset, currently 6 and growing.
This is a good idea for a lot of reasons including improved scalability and decentralization. It would help this vulnerability by making running a thornode as an individual, without other Engineers, much more feasible.
Q: What bond would be needed, now and future predictions.
The network currently makes 2 new nodes active every 3 days in a process known as “churning in”. The nodes that become active are the 2 standby nodes with the highest bonds. The effective minimum bond is therefore the current highest bond of all the standby nodes. This is currently around 1M RUNE ($5.8M) and it is not expected to go below 500k RUNE.
Once the network reaches the cap of 100 active nodes, which could be as soon as January 2022, there will be increased competition amongst node operators which is likely to increase the bond requirements. How high the bond requirements will get depends on the profitability of operating a node but it is anticipated to stabilize between 2M and 2.5M RUNE.
Q: What rewards would be generated, now and future predictions.
The income in RUNE for a node operator is given by:
income = (1 - poolShareFactor) × (blockRewards + liquidityFees) / numberOfActiveNodes
The poolShareFactor will equilibrate to ⅓ so in the medium and long term the only factors that matter are blockRewards and liquidityFees. The blockRewards per day are given by:
blockRewards = reserve / (emissionCurve × 365)
The reserve is the amount of RUNE the Thorchain maintainers have made available to reward node operators and liquidity providers. There is currently 30M RUNE in the reserve and this will be steadily increased up to 180M as the maintainers gain confidence there are no bugs or vulnerabilities that could cause funds in the reserve to be lost. The emissionCurve is a parameter that controls the size of block rewards currently set to 4.
Using the midgard API we can get historical data on rewards to nodes; this data is shown in Table 4. Each row represents the average average per node per day for that week.
Based on these historical figures, in the near term a node operator can expect to receive between 300 and 800 RUNE per day they are active.
We can attempt to predict future rewards by extrapolating from these numbers with a few assumptions:
- The number of nodes will grow linearly up to 100 and then stay stable at 100.
- The reserve will be increased linearly up to 180M RUNE and then decrease according to the emission schedule.
- Liquidity fees will increase linearly.
Under this model we see growth in node rewards depicted in Figure 1. We see that the rewards initially increase but at a decreasing rate due to new nodes joining the network. In December 2021 the node cap of 100 is reached so the number of nodes that each get a share of the total rewards becomes fixed. The rewards continue to increase until April 2022 when all 180M RUNE designated for the reserve are finally allocated. From this point the rewards gradually decrease, eventually stabilizing at around 750 RUNE per node per day in 2026.
This model is imprecise because it makes a number of incorrect assumptions:
- We assume the reserve grows linearly when in reality the Thorchain maintainers add to it in batches (this accounts for the spikes in the observed rewards plot). There is also no way to tell how quickly or slowly they will increment the reserve because it depends on external factors such as growth of the network.
- The reserve also receives funds from other sources such as network fees, thorname registration fees etc. We don’t have any data on these fees so they are omitted from the model.
- Liquidity fees are unlikely to grow linearly; they will be driven by the uptake of Thorchain which is difficult to predict.
- We assume the emissionCurve parameter remains constant whereas the Thorchain maintainers may adjust it depending on the growth of the network.
- We don’t factor in the node being churned out. This can happen if the node misbehaves, has the lowest bond or has been active longer than all other nodes.
Despite the incorrect assumptions of this model, we can be confident that node rewards are likely to increase and will probably stabilise between 500 and 1000 RUNE per day.
It is important to note that these estimates do not factor in the possibility of loss of funds or failing to receive rewards due to:
- Bugs in the thornode software or the external chains.
- Hacks against the thornode network.
- Human error maintaining the node.
- Engineers not responding fast enough to issues with the node leading to the node being penalised for misbehaving.
- Loss of keys to the wallet that bonded the node.
- Hacks against the thornode cluster.
- Theft by Engineers maintaining the node.
Over the last few months there have been several Thorchain bugs and vulnerabilities found so items 1 and 2 should not be underestimated. The Thorchain project appears to be in early stages and does not have a large number of maintainers so we expect more issues to be encountered in the future.
Sloppy engineering practices
We have seen signs of sloppy engineering practices by the Thorchain developers that impact node operators and also raise concerns about the long term future of the project. Here are some of the issues we’ve observed:
- The reason for changes to the code are often not explained in the GitLab issue, merge request or commit messages. It can only be found in Discord but it is usually impossible to locate the relevant Discord conversation from the info in GitLab/git. This was pointed out here.
- Bug fixes are sometimes rushed out which add new bugs or fail to resolve all issues. There then need to be additional rounds of bug fix releases. In the 6 days between June 29th and July 3rd there were 5 bug fix releases pushed out which each required node operators to immediately deploy the upgrade to their node to avoid incurring slashes.
- The commit messages are uninformative which makes it difficult for someone reading the git history to figure out why changes were made.
- Builds are frequently failing, even on master and chaosnet-multichain production branches.
- Unit tests only cover 64% of the code.
- Documentation is not kept up to date with actual behaviour of the node.
- Nearly all changes are made by either Heimdall, Son of Odin or Fandral.
- It only takes a single approval for merge requests to be merged.
- The majority of merge requests are approved without any feedback from reviewers.
- When developers make a mistake in one commit and fix it in a subsequent commit it is best practice to “squash” the commits to avoid cluttering the git history with pointless commits that make it harder to read. Thornode developers sometimes forget to do this, leading to commits with messages like “WIP” (work in progress), “fix typo” and “add a comment”.
We have no doubt the Thornode maintainers are extremely capable Engineers but we suspect they are under enormous pressure and don’t have enough resources so they often cut corners. In many projects this wouldn’t be a huge deal but it is concerning here since Thorchain has hundreds of millions of dollars of assets at stake.
Deploying and maintaining a thornode is not to be taken lightly. It requires a large investment in capital and although the number of hours spent maintaining the node aren’t excessive, the importance of responding quickly to node issues would put considerable pressure on the Engineers maintaining the node. The number of bugs and vulnerabilities that have been discovered in the Thorchain software, the small number of contributors and the sloppy engineering practices employed by the maintainers should also raise red flags about the reliability and long term future of this project.
By design the return from being a liquidity provider is comparable with that of being a node operator and comes with much less risk. If the liquidity cap continues to increase or is removed, being a liquidity provider may be a better option than taking on the risks and burdens of being a node operator. Although this is not a perfect comparison since at the moment in order to provide liquidity one must split one’s investment into equal parts RUNE and another asset, while the financial investment of running a node is all RUNE.