Harmony Services and Ops: Q3 Recap, Q4 Roadmap
A Dive into Q3 Activities, Protofire Partnership, and the Q4 Roadmap for Enhanced DevOps Services.
Summary
With Max's departure and in light of our Q3 activities, Diego and Soph from the DevOps team had to reallocate their time to delve into more protocol-related tasks. Consequently, this shift prompted us to reprioritize some of the DevOps and cost optimization work that was originally scheduled for Q3. Nonetheless, we remained committed to enhancing our DevOps services, focusing on tasks such as the Layerzero bridge, snapshot service, and monitoring stack.
During Q3, we engaged Protofire to exclusively handle our DevOps activities. They joined our team in mid-July, and his primary goal over the past three months has been to familiarize himself with our public services (staking, explorer, bridge), networks (devnet, testnet, mainnet), and internal services (watchdog, snapshot, ansible OPS, and runbooks). He has already made contributions with pull requests (PRs) across various Harmony repositories. In Q4, they will take on a more active role, working on cost reduction and implementing various service improvements identified during Q3.
Protofire's strategic collaboration plan revolves around the potential externalization of our services to them in the future.
Metrics and cloud costs
In Q2 2023, cloud expenses amounted to $65,000, with network transactions (TXs) totaling 6.64 million. The mainnet maintained an uptime of 99.9160%.
Moving to Q3 2023, cloud expenses decreased to $41,000, while network transactions saw an increase, reaching 7.94 million. The mainnet's uptime remained impressive at 99.9381%.
These figures suggest that cloud expenses decreased in Q3, while network transactions continued to grow, and the mainnet's uptime remained consistently high.
Monthly transactions reached their peak in July 2023. This surge was likely influenced by the anticipated airdrop promised by Layerzero.
HIP28 (fee collection) was activated on July 20, 2023, at 05:51:07 UTC, and we have been diligently collecting fees credited to 0xa0c395A83503ad89613E43397e9fE1f8E93B6384. As of October 14, at 4:45 AM UTC, we have accumulated a total of 62,891 ONE tokens.
Q3 Service report
Bridge service
In the Harmony LayerZero bridge service, we made significant progress:
Implemented crucial updates to the LayerZero bridge, enhancing its performance and scalability. These updates are critical for ensuring the smooth operation of the bridge. The API CPU usage was reduced by a notable 60%-70% usage.
Collaborated with team members to debug and resolve LayerZero issues, contributing to the overall robustness of the service.
Introduced new metrics for backend APIs, providing valuable insights into the reasons behind potential API failures and aiding in troubleshooting.
Secure server console access (goteleport)
We have decided not to renew our enterprise contract, which was costing us $7,000 per year. Now, we are exploring alternative solutions such as Cloudflare Zero Trust. In the interim, we are also considering the implementation of GitHub Single Sign-On (SSO) for authenticating our users.
Snapshot service
Over the past year, we successfully transitioned our service storage from AWS S3 to storj.io, which brought significant cost savings, reducing expenses from thousands to just a few hundred dollars per month. However, as of late, our monthly costs have risen to $1500.
To address this, we initiated a project to migrate our service to utilize our dedicated nodes as storage space through WebDAV. During Q3, we completed the migration of our services to this new storage solution. Looking ahead to Q4, our focus will be on efficiently removing the data stored in storj, further optimizing our costs.
Q4 roadmap
1. Review OPS work management process and usage of Jira:
- Objective: To evaluate and optimize the workflow and efficiency of OPS work management using Jira.
- Benefits: Improved project tracking, task prioritization, and streamlined collaboration among OPS team members.
2. Goteleport authentication update and Cloudflare Zero Trust study:
- Objective: To enhance SSH authentication methods and explore Cloudflare Zero Trust as a more cost-effective and secure alternative
- Benefits: Strengthened security for server access while potentially reducing costs and maintaining robust access control
3. Perform annual Secret rotation and Review Internal Credential Security for protofire access:
- Objective: Ensure security by rotating access credentials, while evaluating and adding the credentials for protofire’s access
- Benefits: Minimize security risks and maintain robust access control to protect sensitive information and systems
4. New harmony explorer R&D work with Protofire and SocialScan:
- Objective: 1) Collaborate with Protofire to conduct research and development for the Blockscout Harmony explorer, 2) Evaluate SocialScan proposal for new harmony block explorer
- Benefits: Retire current harmony explorer in favor of a 3rd party managed explorer to enhance the functionality and features, potentially improving user experience
5. Mainnet node network work and Node contract renewal with Latitude:
- Objective: 1) to retire s2/s3 nodes with help of Protofire 2) Latitude contract to be reviewed and renewed
- Benefits: Ensured network integrity, cost and efficiency, and potentially secured more favorable contract terms.
6. Cost opportunities (VPN termination, S3 data review, Node migration, snapshot service):
- Objective: Implement cost-saving opportunities by retiring VPN usage, reviewing and optimizing data storage in S3, with the help of protofire : migrate node to more cost-effective providers, historical backup storage for snapshot service cost evaluation
- Benefits: Reduced operational costs, efficient resource allocation, and potential improvements in data management.
7. Protofire - Internal service improvement (Watchdog, Snapshot, grafana stack):
- Objective: Enhance the performance and reliability of internal services, including Watchdog, Snapshot, and the Grafana stack by implementing automation and working on bug fixes
- Benefits: Improved monitoring, data backup, and data visualization for more effective service management and issue resolution.
8. Protofire - Internal service internal learning (Prometheus gateway with intention to migrate to another provider, Testnet faucet):
- Objective: 1) Explore Prometheus gateway for internal learning purposes and evaluate potential migration to another service provider. 2) Additionally, assess and improve the efficiency of the Testnet faucet.
- Benefits: Enhanced knowledge and familiarity with new tools and potential cost savings and improved user experience with the Testnet faucet. Aligned with devops service decentralization strategy
9. Protofire - General Network Ops and first-hand (l1) support/troubleshooting
- Objective: Provide general network operations support and Level 1 troubleshooting for various aspects, including the explorer dashboard, staking, and networks.
- Benefits: Ensured network stability, rapid issue resolution, and user support for seamless operations.