In this post, we analyze the spamming attacks on Harmony and how our engineering team deployed solutions to fully resolve the issues and make the network even more resilient.

Description

Starting on June 5th, the Harmony network begin encountering spamming attacks of various different approaches.

One of the spamming attack approaches is sending large number of spamming transactions. Facing this challenge, the engineering team has been exploring the vulnerabilities and worked out several urgent releases to fix the problem.

By June 21st, most of the vulnerabilities exposed by this attack has been mitigated and all network metrics have come back to normal.

In this report, the following technical details regarding the spamming transaction attack is discussed:

  1. What is the spam pattern?
  2. What is the impact of large block size and how did we fix the problem?
  3. What problem is exposed in explorer DB facing this increasing amount of transactions?

1. Spamming patterns

Starting from June 5th, Harmony blockchain has been receiving a large number of transactions on our beacon chain. These transactions have the following characteristics:

  1. All these transactions are smart contract calls with some nested static calls in code execution.
  2. Though these transactions come from quite a number of different addresses, the receiver addresses are limited to four contract addresses.
  3. All transactions do not have any token transfer records, which is most likely just modifying some value within the contract storage.
  4. The amount of transactions is huge. It peaked at ~150/block, which is about 40 transactions per second, >5 times the TPS on Ethereum.

Thus, we categorize these transactions as spamming transactions as part of the spamming attacks against our network.

Here is some reference of the spamming information from our explorer:

  1. A spamming transaction: 0xa8ff711b52d2486e0cfab6e9cc13df029971243fd2743034d1db78e69bee8619fd2743034d1db78e69bee8619
  2. A list of spamming addresses

By the time of 2:30 p.m PST June 21th, 2021, these 4 addresses have received 18,311,525 transactions.

The large transaction amount will result in two problems:

  1. Make the blocks larger than ever, which effects the RPC sync service.
  2. The transaction history of a single account can become large, which makes the explorer db hard to catch up with.

2. Impact: Large block size

As the number of spamming transactions increases, the block size has once reached 1.5MB. Though the large block size does not affect our FBFT consensus algorithm, it does make it difficult for nodes to catch up with harmony RPC sync service. RPC sync service is the centralized service currently provided by the harmony team to help nodes to catch up with the latest block. Furthermore, since it is difficult for nodes on the shard chain to sync beacon chain data, the shard 3 chain also breaks in consensus and having downtime of ~2 hours since they cannot agree on the latest epoch of block being produced.

The problem actually lies in the implementation of sync service with ProtoBuf:

  1. The default ProtoBuf server-client config sets 4MiB as the maximum message size.
  2. In our sync protocol, the client will request 30 blocks of data per message.

Thus, if the block size reaches 1.5MB, the response message returned by the server will certainly exceed the maximum and rejected by the client. Thus, the following fixes are implemented to mitigate this issue:

  1. An initial attempt to half the number of blocks per request: Link.
  2. An optimization of Sync protocol’s logic at server’s end: Link.
  3. Sync module parameters update on large blocks: Link. Within this PR, there was also some parameter tuning for small bandwidth machines.

With these three fixes being deployed with version 4.1.6, all problems in sync module are resolved.

3. Impact: explorer

Another impact of a large number of transactions is that it exposes the un-scalability of the explorer DB schema design.

The explorer DB in the harmony node is the extra local storage only for the explorer node. The database is used to store transaction information for the explorer node, and several RPC endpoints are depending on the data in the explorer DB:

  • hmyv2_getTransactionsCount
  • hmyv2_getTransactionsHistory
  • hmyv2_getStakingTransactionsCount
  • hmyv2_getStakingTransactionsHistory

Previously, since the explorer feature was requested urgently in the initial sprint, the schema of the explorer DB was not carefully designed, thus was the explorer nodes cannot handle the increased volume of transactions per account.

Let’s take a deeper look at the explorer DB schema: Basically, the explorer was storing the following data structure:

type AddressInfo struct {
 ID         string       json:"id"
Balance    *big.Int     json:"balance" // Deprecated
TXs        LegTxRecords json:"txs"
StakingTXs LegTxRecords json:"staking_txs"
}

And each address info is RLP encoded and written to the key-value database. This schema means, adding even one new transaction to an account requires writing the whole transaction history of the address to the storage. During the debugging process, we found that this flaky schema requires about 500MB of DB write and 12s processing time for block, causing the explorer node to choke at CPU, memory, and disk IO.

Since all nodes behind Harmony public RPC points are explorer nodes with this DB schema, the public RPC points have experienced some level of un-stability, and data inaccuracy related to explorer DB.

Facing this problem, we took actions on both the dev-ops side and developer side.

OPS actions (serve as temporary solutions):

  1. Upgrade all non-archival explorer nodes behind RPC endpoints to AWS m5d.2xlarge instances, with local NVMe drive to address IOPS bottleneck.
  2. Upgrade all archival explorer nodes behind RPC endpoints to AWS i3en.6xlarge instances, optimized for local NVMe storage.
  3. Add logic to load balancer at the public RPC endpoints, to automatically take away explorer nodes that are falling behind the sync because of the stress in CPU, memory, and disk IO.

There were three main network upgrades that we made:

  1. Increase the Gas limit to 1 gWei to increase the cost of spamming transactions: Link.
  2. A fix to adding an in-memory cache for explorer database (DB): Link. This serves as a temporary fix to ease the machine resources.
  3. An explorer database schema change that includes a database migration and the new schema logic: Link. The idea behind this fix is to shred the bulk address information into small entries each with an address and a transaction.

With all machines upgraded and the first and second dev fix being released and deployed on June 16th, the external RPC endpoints became more stable and were able to give accurate data based on addresses.

With the third dev fix in v4.1.7 deployed on June 18th, the explorer DB was migrated successfully. Furthermore, after deploying the fix, all node metrics have returned to normal state, with much less system resource occupied. At this point, the issue of RPC endpoints had been fully resolved.

4. Conclusion

The spamming transactions generated by the attacker were causing some troubles on Harmony’s network starting from June 5th. The major impacts were in two aspects:

  1. The increased blocks size made the node hard to catch up with RPC sync protocol.
  2. The explorer nodes were choked at system resources because of the DB schema design, making the public RPC endpoints unstable and inaccurate.

Facing these challenges, the engineering team have been actively working on these issues and mitigated them with a few rapid fixes and releases. As of the release v4.1.7 that came out on June 18, the problems caused by spamming transactions were fully resolved. Currently, the spamming transactions on Harmony beacon chain have been stopped, and all node metrics have returned to normal.

We are planning to do a further exploration of the P2P /RPC layer, and based on the result, we will launch a spamming test on testnet shard 3 to mitigate the issue. This attack-defense game will help us find more potential vulnerabilities in the system and make the system more robust against all attack scenarios.

Follow our GitHub repository for more technical information about Harmony’s network.