Ever pondered the viability of operating a company predominantly on self-hosted and Open Source software?
Have you wondered if it's scalable, say from a tight-knit team of 3 to a bustling group of 40?
Or just how much efficiency you can squeeze out of a modest virtualization cluster?
Flashback to 2012: Vates began its journey with just the three of us—its founders. We started off by renting a single machine in a datacenter, solely to host our virtual machines. Fast forward to the present, and we've grown to a team of 40. However, if you count all who interact with our IT infrastructure in some capacity, that number jumps closer to 60. And the most fascinating part? Our entire infrastructure hums away on just three main machines, with an additional one each for storage and backup.
It might be easy to imagine a large, specialized IT team behind the scenes, ensuring this system functions seamlessly. But here's the twist: despite being the CEO of a 40-person company, I still dedicate just 2 to 3 hours monthly to ensure everything is updated and runs smoothly.
To many, self-hosting might seem like a mere passion project. However, it offers an array of advantages over the conventional SaaS model. Here are some compelling reasons why we've chosen this route:
- Competitive TCO: From small setups to expansive infrastructures, self-hosting scales efficiently. You can condense an immense amount of capability into just a few Us in a rack, optimizing space.
- Predictable Costs: Owning the infrastructure means there are no hidden cost surprises. While hosting or colocation provider fees might fluctuate, the majority of the costs are consistent and can be mapped out for years.
- No Vendor Lock-in: You aren’t tethered to the whims, pricing, or future plans of a SaaS provider. This autonomy can lead to greater tech flexibility down the line.
- Data Sovereignty: With self-hosting, visitor tracking and data storage remain entirely in-house. In contrast, SaaS models often involve relinquishing your data to third parties. This is especially vital for businesses in regions with stringent data residency regulations.
- Supporting the Open Source Community: We’re not just consumers; we actively engage with and contribute to various Open Source projects. We do our bit!
- Flexibility: Given the compute potential a rack offers, scaling infrastructure becomes a breeze. Most businesses don’t actually need the elasticity of public clouds. By self-hosting, you craft an infrastructure that aligns perfectly with your current and anticipated requirements.
- Retaining Expertise: Self-hosting cultivates a repository of specialized knowledge that proves invaluable when assisting our clientele.
- Dedicated Resources: Unlike multi-tenant SaaS setups where resources could be split among different users, self-hosted systems ensure that all resources are reserved for your needs.
Certainly, this path necessitates a degree of technical acumen. But we believe in retaining control and nurturing the expertise vital for our IT operations. Contrary to the narrative some SaaS or Cloud vendors might push, the process isn't as daunting as it's often portrayed.
Our Infrastructure Explained
Before diving deep into the nitty-gritty, let's set the foundation by discussing the core components that drive our self-hosting: the physical infrastructure.
Options Aplenty: You're presented with an array of choices when considering where to base your IT infrastructure:
- Renting VPS (Virtual Private Servers): While economical on a monthly basis, there's an element of unpredictability. You're often at the mercy of inconsistent performance due to over-provisioning. The security landscape in 2023 also presents its challenges. Issues like Spectre, Meltdown, and other speculative attacks on a CPU core can breach isolation, especially if the hypervisor isn't configured adeptly. Let's do some quick math: $40/month for 4 vCPUs and 8GiB RAM translates to approximately $2000/month, equating to the power of our physical servers.
- Renting Physical Machines: Here, your dollars stretch further. However, the onus is on you to lean on a virtualization platform (and we’d cheekily suggest XCP-ng!). For a few hundred dollars monthly, you can access machines boasting 16+ cores and a robust 128GiB memory. While this investment seems significant, the performance and isolation benefits, given the dedicated hardware, are substantial. The caveat? The cost never truly amortizes since you're perpetually renting, and customizing the infrastructure can be cumbersome due to provider-imposed restrictions.
After weighing these options, in 2019, we gravitated towards a unique solution.
Our Chosen Path - Renting a Rack: Post some initial hardware renting, we transitioned our infrastructure to a dedicated rack in a data center. To summarize our setup:
- Three dedicated compute nodes: Each one is a Dell R6515, leveraging EPYC 7302P, and packed with 128GiB.
- A refurbished Dell R730 storage node: This connects via NFS, fortified with a few high-speed NVMe drives.
- A Dell R6515 in our recovery site (another DC) 500km away, with local drives, able to run our critical VMs in case we just lost the main DC.
Powering all this is XCP-ng coupled with Xen Orchestra. This setup affords us multi-user management, efficient backups (delta-based, refreshed every 6 hours in a separate room within the DC), and even replication to a secondary site situated 500km away (conducted nightly) – a contingency for any catastrophic data center events.
Thus far, the journey of anchoring our infrastructure on Open Source software has been seamless (more on our service uptime in the conclusion). Next, let’s delve into the specific software that we've housed within this rack.
If you want more details, you can read our previous blog post on this:
The software stack
To give you a sense of scale, our assortment of self-hosted services spans approximately 30 VMs, utilizing just under 200GiB of RAM and 2TiB of disk space. In tangible terms, this is a mere 5Us and less than 1kW of power consumption, forming the backbone of our entire organization.
📛 Directory and SSO
Like many businesses, we depend on a unified directory to provide a seamless login experience across our array of services. Our journey began with OpenLDAP back in 2012, a robust choice we still utilize today.
Over time, though, it's evolved to act primarily as a backup for our SSO portal, facilitated through Keycloak. This portal draws upon the foundational LDAP directory, housing our existing users. With Keycloak's integration, we now have a full-fledged SSO portal supporting both SAML and OpenID Connect, fortified with two-factor authentication options, be it TOTP or physical devices like Yubikey.
And for added security, none of our machines are openly accessible via their public IPs; access is granted solely through a dedicated management network, accessible via a VPN powered by Wireguard.
When it comes to email, Bluemind is our platform of choice. For those familiar with Zimbra, Bluemind bears similarities but stands out due to its meticulous maintenance.
It packs a unique punch: compatibility with the Exchange protocol, facilitating seamless syncing for devices and even satisfying hardcore Outlook enthusiasts. Naturally, calendaring is also a part of its toolkit.
📅 Web meetings planning
In our quest to make external meetings hassle-free, we've integrated a self-hosted Cal.com instance, which flawlessly syncs with our Bluemind calendars. Despite a few minor imperfections, its capability cannot be understated.
🐕🦺 Support and contact emails
Zammad shines as an indispensable tool for our inbound sales teams, streamlining interactions and ensuring nothing slips through the cracks. A quick example: an email sent to "firstname.lastname@example.org" doesn’t just land in an inbox. Instead, it's automatically converted into a ticket, available for the entire team to view. This dynamic approach enables effortless assignment, ensuring inquiries receive prompt attention, and distributes the workload evenly among team members. It also facilitates fluid communication between different departments, making it a breeze to transition a conversation from our technical team to our sales representatives, and vice versa.
Additionally, Zammad isn’t limited to email. It powers our front-facing chat system, providing potential clients and users with a platform for real-time interactions with our sales and tech teams. This immediate communication often serves as a precursor to more formalized email exchanges.
In essence, Zammad consolidates multiple communication avenues into one cohesive system, fostering efficiency, transparency, and collaboration.
📂 File sharing & editing
Enter NextCloud - our chosen platform for both internal file sharing and the occasional document editing, reminiscent of the Google Docs experience, thanks to Collabora. Admittedly, we're not heavy on paperwork, so our use of this feature is relatively light. Yet, whenever the need arises, NextCloud fits the bill perfectly!
It's also handy to quickly share some Markdown content or to draw mind maps with the relevant plugins.
💬 Live chat
For our live internal chat needs, Mattermost is the name of the game. With roughly 100 active accounts, it's proven to be robust and reliable. Although we operate a single "team" instance, it's inclusive of individuals outside of Vates, particularly for collaborative projects. And worry not; confidentiality is maintained through the use of private channels.
While the core chatting functionality serves us well, we're also big fans of some of its additional features. The integrated "Boards" function, akin to Kanban or Notion, greatly assists our project management endeavors. Moreover, the "Call" plugin comes in handy for voice-only internal calls, adding another layer of convenience to our communication toolkit.
At the heart of our operations is a suite of tools that emphasizes not just efficiency but also the ability to collaborate seamlessly across teams and projects.
📹 Video calls
For scheduled or external meetings, Jitsi has become our trusted ally. Its seamless performance, particularly in screen sharing, combined with its user-friendly interface, has made our virtual interactions smoother than ever. Occasionally, we do encounter a few hiccups when engaging with individuals from companies with strict network limitations, where the WebRTC stream is unfortunately blocked. It's a stark contrast to more commonly used platforms like MS Teams or Google, which often bypass such restrictions.
What makes our choice of Jitsi even more compelling is its robust security. Hosted on our own platform, it ensures that our discussions remain private, free from any external eavesdropping.
When it comes to blogging, Ghost Blog is our platform of choice, not just for this blog but also for our other outlets like XCP-ng and Xen Orchestra. Our decision to commit to Ghost Blog has stood the test of time. Its user-friendly design caters seamlessly to both our tech-savvy members and those less acquainted with the technicalities, ensuring that everyone has a voice in our shared journey.
For our community forums, we've invested in NodeBB. It offers an engaging platform for our community to connect and share insights. It's pretty fast and simple to maintain. We are happy with it while we reach almost 10,000 unique accounts created!
🍵 Code repository
Our current go-to for code repositories is Gitea. With its impressive speed and efficiency, it's a refreshing shift from the more complex and often sluggish GitLab. It's easy to install and update, which is perfect to us! We can also add the good SSO integration with it.
💻 Infrastructure management
To effectively manage our server and IT equipment, Netbox has proven indispensable. It seamlessly syncs with our Xen Orchestra plugin to keep track of IP usage.
For managing our desktop and laptop inventory, we trust SnipeIT.
Both software are really powerful and deliver everything needed to manage a fleet of servers, desktop and laptops in every details!
✅ Service status
UptimeKuma is our preferred tool for monitoring service uptime, with a dedicated status page at https://status.vates.tech. It also doubles as an alerting tool, notifying us through platforms like Mattermost and Telegram during any infrastructure issues.
🌡️ Real-time monitoring
For a granular, real-time understanding of our virtual machines' performance, we have deployed Netdata on each VM. This ensures we're continually aware of how our resources are used and can pinpoint any potential inefficiencies or issues immediately. All the individual VM metrics stream to a centralized Netdata instance. This acts as an aggregation point, ensuring that we have a holistic view of all our resources in one place. The centralized approach ensures that our team doesn't have to jump between individual VMs to understand the broader performance context, making it a lot easier to identify and respond to trends or anomalies.
Our central Netdata instance writes its data into a Prometheus database. Known for its reliability and scalability, Prometheus gives us the assurance that our metric data is stored efficiently and can be queried with speed when needed.
We use Netdata in each VM, streaming to a central Netdata instand, which is itself writing into a Prometheus database. Then, we explore all those evolving counters with Grafana, with various alerting rules.
Finally, to make sense of these metrics visually, we use Grafana. Grafana allows us to create custom dashboards that can visually represent our VMs' health, performance, and other key indicators. More than just viewing, Grafana's power lies in its ability to alert us based on the predefined conditions we set. Through various alerting rules, we can be instantly notified if any metric goes beyond our acceptable thresholds, allowing for immediate action.
By leveraging Netdata, Prometheus, and Grafana, we've established a robust monitoring and alerting mechanism. This ensures not only the high availability and performance of our services but also proactively guards against potential disruptions.
⚙️ Building platform
To build XCP-ng, we use Koji. You can read more on how we use it in our dedicated documentation: https://docs.xcp-ng.org/project/development-process/build-system/
🪞 Distributed content delivery
To ensure our global users get efficient access, we use Mirrorbit. Depending on a user's location, they're directed to the nearest server mirror. You can see the live list of our mirrors here: https://mirrors.xcp-ng.org/?mirrorstats
📈 Data Analytics and Reporting
For deep insights and reporting, we lean on ElasticSearch and Kibana. They aid us in visualizing key performance indicators that we derive.
Mautic has revolutionized our email campaigns and visitor tracking. In tandem with this, Matomo offers basic visitor statistics. Without entering in too much details, Mautic is a fantastic tool to leverage your marketing via automation, segments and many other features.
💼 Sales & CRM
EspoCRM is our pick for customer relationship management, owing to its simplicity and flexibility. Together with Mautic, our sales process becomes streamlined. Whenever leads land in our email/contact system, they're automatically integrated via Zammad and Mautic. They then get sent to EspoCRM, empowering our sales team with comprehensive insights to facilitate conversions.
🖵 Slides & presentations
Remark.js is a lightweight, web-based slide deck tool that turns your markdown into beautiful slides. Given its simple HTML/JS/CSS nature, hosting these slides in-house ensures quick load times, offline access, and full customization control.
We also self-host various other things, that matters in the end:
- Fonts and Images: why rely on external services like Google Fonts when hosting your own ensures no downtime, no tracking, and consistent load times? This is an excellent move, especially for email campaigns where recipients might be wary of external assets.
- PrivateBin for Secure Data Sharing: PrivateBin offers end-to-end encryption for your pastes, ensuring that only those with the link (and password, if you've set one) can access the data. It's a privacy-centric alternative to other pastebin services, ideal for sharing logs, debug information, and other sensitive data.
- Custom E-commerce and Partner Portal: By developing and hosting your own shop system and partner portal, you're in full control of both the customer experience and the data. This allows for better integration with your existing systems, full customization to your needs, and a clear chain of data custody.
The road ahead
How much further can we expand within our current setup? Technically speaking, there's ample room for growth. We're only tapping into 40% of our RAM capacity and a mere 1/8 of our total CPU capabilities.
If the need arises, enhancing our host or incorporating more compute nodes is always an option. As for storage, the flexibility of live-migrating VM disks means we could easily add more storage space or even transition to a specialized storage cluster, exploring options like Ceph.
Our journey of self-hosting remains robust and promising, even if Vates were to double (or more) in scale!
Conclusion: our self-hosted journey
Over the years, we've grown organically, and our IT infrastructure stands as a testament to the power of determination, reliance on Open Source, and the pursuit of data sovereignty.
Here's what we've learned and achieved:
- Organic Growth Works: We didn't aim for the biggest infrastructure immediately. Instead, we added services as needed. This gradual approach allowed us to stabilize and adapt our infrastructure without hasty decisions.
- Ownership is Rewarding: By shifting to our own machines in a datacenter, we converted an ongoing cost into a valuable investment. Within just four years, this investment paid for itself. More than just a financial benefit, this gave us unprecedented control and stability.
- Our Uptime Speaks Volumes: Our record of approximately 99.994% uptime over the past four years is something we're incredibly proud of. This goes to show that self-hosting doesn't mean compromising on reliability.
- It's Not Just About Money: While our Open Source approach might be seen as cost-effective, our primary motivation wasn't just the financial aspect. Our goal was, and remains, to have absolute control over our data, to build our own IT capabilities, and to continuously learn and adapt.
- Commitment to Open Source: Our journey with Open Source isn't about using 'free' tools but cherishing the 'freedom' they provide. That's why we also invest in supporting tools and platforms that align with our mission.
- Data is Paramount: In an era where data breaches are common, having control over our own data has been invaluable. Our self-hosting strategy ensures the highest standards of data protection and privacy.
In essence, our journey embodies the philosophy of "building for ourselves". While challenges are part and parcel of such endeavors, the rewards, both tangible and intangible, make every effort worthwhile. We're proud of the path we've chosen, and we believe it offers unparalleled autonomy, control, and self-sufficiency.