Release 0.13.x issues postmortem


#1

Release 0.13.x we made in June brought a few issues which are already resolved by team. Here we list issues, resolution and our conclusions to make our community aware of current situation and plans. Here is a list of issue and resolutions:

Old PoS compatibility. New version includes new PoS algo - Fair PoS, but in order to be compatible with old node version it also contains old PoS algorithm which works before Fair PoS activation. The issue was that in fact this implementation of old PoS algo in new node not 100% compatible with old one. In very rear cases blocks, produced by updated miners was not valid for old nodes. In a few days after release updated miner node forged such block. As a result old-version node declined this block and generated their own block.

Resolution: Since most of mining nodes were already updated to new version and supported new chain, the decision was to ask old node owners to update to new version ASAP and start accepting nodes produced by majority of miners.

Burn and reissuebility. Due to database optimization and code refactoring in the new version, there was the following bug: If you burn some amount of a token which was marked as non-reissuable, nodes started to consider re-issue transactions for this token as valid.

Resolution: Issue was fixed 0.13.4 release, but already accepted reissue transactions remained in blockchain. A few assets were re-issued improperly.

Alias re-claim. Due to database optimization and code refactoring under certain conditions it was possible to reclaim already claimed alias. One could reclaim alias if previous one was claimed more than 1.5 hours ago, because of this condition our automated tests haven’t caught this issue.

Resolution: Issue was fixed 0.13.4 release, but already re-claimed aliases were blocked, it’s not possible to use it neither by first nor second owner. There are several dozen aliases which are now blocked, but they could be restored in future.

Conclusions

  1. As the network grows, we should move away from the goal to roll out more features as fast as possible. Now it’s time to put release quality and safety first. So if we aren’t sure and have a choice to release on some expected date or test more, we’ll test more.

  2. We changed our internal development process to pay more attention to covering functionality with tests and share code knowledge more within a team.

  3. Release preparation process will be improved. We’ll rollout release candidate builds to TestNet in advance (several weeks before release). In case something goes wrong, MainNet release can be postponed.

  4. We are going to have public bug bounty with considerable rewards when prepared for next releases.

  5. We improved our monitoring tools to react to issues faster.

  6. Release candidates will be tested on MainNet as well (including some mining nodes) before an official release. Actually, we already have a standalone node which is deployed automatically to MainNet every time developers commit new code to the repository.

  7. We’ve implemented synchronization tool which will synchronize unconfirmed transactions between networks in case of forks to minimize the number of lost transactions.

  8. Once MainNet release is done we’ll ask major miners not to update their nodes too fast. In case new build contains some bugs the damage would be lower if the majority is not updated.

Thanks everyone for understanding and help in addressing these difficulties.