This month we covered Nadia Eghbal’s instant classic about open-source software. Open-source software has been around since the late seventies but only recently it has gained significant public and business attention.
Tech Themes
Misunderstood Communities. Open source is frequently viewed as an overwhelmingly positive force for good - taking software and making it free for everyone to use. Many think of open source as community-driven, where everyone participates and contributes to making the software better. The theory is that so many eyeballs and contributors to the software improves security, improves reliability, and increases distribution. In reality, open-source communities take the shape of the “90-9-1” rule and act more like social media than you could think. According to Wikipedia, the "90–9–1” rule states that for websites where users can both create and edit content, 1% of people create content, 9% edit or modify that content, and 90% view the content without contributing. To show how this applies to open source communities, Eghbal cites a study by North Carolina State Researchers: “One study found that in more than 85% of open source projects the research examined on Github, less than 5% of developers were responsible for 95% of code and social interactions.” These creators, contributors, and maintainers are developer influencers: “Each of these developers commands a large audience of people who follow them personally; they have the attention of thousands of developers.” Unlike Instagram and Twitch influencers, who often actively try to build their audiences, open-source developer influencers sometimes find the attention off-putting - they simply published something to help others and suddenly found themselves with actual influence. The challenging truth of open source is that core contributors and maintainers give significant amounts of their time and attention to their communities - often spending hours at a time responding to pull requests (requests for changes / new features) on Github. Evan Czaplicki’s insightful talk entitled “The Hard Parts of Open Source,” speaks to this challenging dynamic. Evan created the open-source project, Elm, a functional programming language that compiles Javascript, because he wanted to make functional programming more accessible to developers. As one of its core maintainers, he has repeatedly been hit with requests of “Why don’t you just…” from non-contributing developers angrily asking why a feature wasn’t included in the latest release. As fastlane creator, Felix Krause put it, “The bigger your project becomes, the harder it is to keep the innovation you had in the beginning of your project. Suddenly you have to consider hundreds of different use cases…Once you pass a few thousand active users, you’ll notice that helping your users takes more time than actually working on your project. People submit all kinds of issues, most of them aren’t actually issues, but feature requests or questions.” When you use open-source software, remember who is contributing and maintaining it - and the days and years poured into the project for the sole goal of increasing its utility for the masses.
Git it? Git was created by Linus Torvalds in 2005. We talked about Torvalds last month, who also created the most famous open-source operating system, Linux. Git was born in response to a skirmish with Larry McAvoy, the head of proprietary tool BitKeeper, over the potential misuse of his product. Torvalds went on vacation for a week and hammered out the most dominant version control system today - git. Version control systems allow developers to work simultaneously on projects, committing any changes to a centralized branch of code. It also allows for any changes to be rolled back to earlier versions which can be enormously helpful if a bug is found in the main branch. Git ushered in a new wave of version control, but the open-source version was somewhat difficult to use for the untrained developer. Enter Github and GitLab - two companies built around the idea of making the git version control system easier for developers to use. Github came first, in 2007, offering a platform to host and share projects. The Github platform was free, but not open source - developers couldn’t build onto their hosting platform - only use it. GitLab started in 2014 to offer an alternative, fully-open sourced platform that allowed individuals to self-host a Github-like tracking program, providing improved security and control. Because of Github’s first mover advantage, however, it has become the dominant platform upon which developers build: “Github is still by far the dominant market player: while it’s hard to find public numbers on GitLab’s adoption, its website claims more than 100,000 organizations use its product, whereas GitHub claims more than 2.9 million organizations.” Developers find GitHub incredibly easy to use, creating an enormous wave of open source projects and code-sharing. The company added 10 million new users in 2019 alone - bringing the total to over 40 million worldwide. This growth prompted Microsoft to buy GitHub in 2018 for $7.5B. We are in the early stages of this development explosion, and it will be interesting to see how increased code accessibility changes the world over the next ten years.
Developing and Maintaining an Ecosystem Forever. Open source communities are unique and complex - with different user and contributor dynamics. Eghbal tries to segment the different types of open source communities into four buckets - federations, clubs, stadiums, and toys - characterized below in the two by two matrix - based on contributor growth and user growth. Federations are the pinnacle of open source software development - many contributors and many users, creating a vibrant ecosystem of innovative development. Clubs represent more niche and focused communities, including vertical-specific tools like astronomy package, Astropy. Stadiums are highly centralized but large communities - this typically means only a few contributors but a significant user base. It is up to these core contributors to lead the ecosystem as opposed to decentralized federations that have so many contributors they can go in all directions. Lastly, there are toys, which have low user growth and low contributor growth but may actually be very useful projects. Interestingly, projects can shift in and out of these community types as they become more or less relevant. For example, developers from Yahoo open-sourced their Hadoop project based on Google’s File System and Map Reduce papers. The initial project slowly became huge, moving from a stadium to a federation, and formed subprojects around it, like Apache Spark. What’s interesting, is that projects mature and change, and code can remain in production for a number of years after the project’s day in the spotlight is gone. According to Eghbal, “Some of the oldest code ever written is still running in production today. Fortran, which was first developed in 1957 at IBM, is still widely used in aerospace, weather forecasting, and other computational industries.” These ecosystems can exist forever, but the costs of these ecosystems (creation, distribution, and maintenance) are often hidden, especially the maintenance aspect. The cost of creation and distribution has dropped significantly in the past ten years - with many of the world’s developers all working in the same ecosystem on GitHub - but it has also increased the total cost of maintenance, and that maintenance cost can be significant. Bootstrap co-creator Jacob Thornton likens maintenance costs to caring for an old dog: “I’ve created endlessly more and more projects that have now turned [from puppies] into dogs. Almost every project I release will get 2,000, 3,000 watchers, which is enough to have this guilt, which is essentially like ‘I need to maintain this, I need to take care of this dog.” Communities change from toys to clubs to stadiums to federations but they may also change back as new tools are developed. Old projects still need to be maintained and that code and maintenance comes down to committed developers.
Business Themes
Revenue Model Matching. One of the earliest code-hosting platforms was SourceForge, a company founded in 1999. The Company pioneered the idea of code-hosting - letting developers publish their code for easy download. It became famous for letting open-source developers use the platform free of charge. SourceForge was created by VA Software, an internet bubble darling that saw its stock price decimated when the bubble finally burst. The challenge with scaling SourceForge was a revenue model mismatch - VA Software made money with paid advertising, which allowed it to offer its tools to developers for free, but meant its revenue model was highly variable. When the company went public, it was still a small and unproven business, posting $17M in revenue and $31M in costs. The revenue model mismatch is starting to rear its head again, with traditional software as a service (SaaS) recurring subscription models catching some heat. Many cloud service and API companies are pricing by usage rather than a fixed, high margin subscription fee. This is the classic electric utility model - you only pay for what you use. Snowflake CEO Frank Slootman (who formerly ran SaaS pioneer ServiceNow) commented: “I also did not like SaaS that much as a business model, felt it not equitable for customers.” Snowflake instead charges based on credits which pay for usage. The issue with usage-based billing has traditionally been price transparency, which can be obfuscated with customer credit systems and incalculable pricing, like Amazon Web Services. This revenue model mismatch was just one problem for SourceForge. As git became the dominant version control system, SourceForge was reluctant to support it - opting for its traditional tools instead. Pricing norms change, and new technology comes out every day, it’s imperative that businesses have a strong grasp of the value they provide to their customers and align their revenue model with customers, so a fair trade-off is created.
Open Core Model. There has been enormous growth in open source businesses in the past few years, which typically operate on an open core model. The open core model means the Company offers a free, normally feature limited, version of its software and also a proprietary, enterprise version with additional features. Developers might adopt the free version but hit usage limits or feature constraints, causing them to purchase the paid version. The open-source “core” is often just that - freely available for anyone to download and modify; the core's actual source code is normally published on GitHub, and developers can fork the project or do whatever they wish with that open core. The commercial product is normally closed source and not available for modification, providing the business a product. Joseph Jacks, who runs Open Source Software (OSS) Capital, an investment firm focused on open source, displays four types of open core business model (pictured above). The business models differ based on how much of the software is open source. Github, interestingly, employs the “thick” model of being mostly proprietary, with only 10% of its software truly open-sourced. Its funny that the site that hosts and facilitates the most open source development is proprietary. Jacks nails the most important question in the open core model: “How much stays open vs. How much stays closed?” The consequences can be dire to a business - open source too much and all of a sudden other companies can quickly recreate your tool. Many DevOps tools have experienced the perils of open source, with some companies losing control of the project it was supposed to facilitate. On the flip side, keeping more of the software closed source goes against the open-source ethos, which can be viewed as organizations selling out. The continuous delivery pipeline project Jenkins has struggled to satiate its growing user base, leading to the CEO of the Jenkins company, CloudBees, posting the blog post entitled, “Shifting Gears”: “But at the same time, the incremental, autonomous nature of our community made us demonstrably unable to solve certain kinds of problems. And after 10+ years, these unsolved problems are getting more pronounced, and they are taking a toll — segments of users correctly feel that the community doesn’t get them, because we have shown an inability to address some of their greatest difficulties in using Jenkins. And I know some of those problems, such as service instability, matter to all of us.” Striking this balance is incredibly tough, especially in a world of competing projects and finite development time and money in a commercial setting. Furthermore, large companies like AWS are taking open core tools like Elastic and MongoDB and recreating them in proprietary fashions (Elasticsearch Service and DocumentDB) prompting company CEO’s to appropriately lash out. Commercializing open source software is a never-ending battle against proprietary players and yourself.
Compensation for Open Source. Eghabl characterizes two types of funders of open-source - institutions (companies, governments, universities) and individuals (usually developers who are direct users). Companies like to fund improved code quality, influence, and access to core projects. The largest groups of contributors to open source projects are mainly corporations like Microsoft, Google, Red Hat, IBM, and Intel. These corporations are big enough and profitable enough to hire individuals and allow them to strike a comfortable balance between time spent on commercial software and time spent on open source. This also functions as a marketing expense for the big corporations; big companies like having influencer developers on payroll to get the company’s name out into the ecosystem. Evan You, who authored Vue.js, a javascript framework described company backed open-source projects: “The thing about company-backed open-source projects is that in a lot of cases… they want to make it sort of an open standard for a certain industry, or sometimes they simply open-source it to serve as some sort of publicity improvement to help with recruiting… If this project no longer serves that purpose, then most companies will probably just cut it, or (in other terms) just give it to the community and let the community drive it.” In contrast to company-funded projects, developer-funded projects are often donation based. With the rise of online tools for encouraging payments like Stripe and Patreon, more and more funding is being directed to individual open source developers. Unfortunately though, it is still hard for many open source developers to pursue open source on individual contributions, especially if they work on multiple projects at the same time. Open source developer Sindre Sorhus explains: “It’s a lot harder to attract company sponsors when you maintain a lot of projects of varying sizes instead of just one large popular project like Babel, even if many of those projects are the backbone of the Node.js ecosystem.” Whether working in a company or as an individual developer, building and maintaining open source software takes significant time and effort and rarely leads to significant monetary compensation.