Blog by: Kostadis Roussos
By: Kostadis Roussos, Principal Engineer, VMware
When I studied computer science in 1992, Brown University had no course on the topic. The focus of the education was on the fundamentals of our field. It was about algorithms, data structures, computer architecture, system architecture, and software engineering. That was and is a curious omission. Later on, when I did a master’s degree at Stanford, software architecture was, once again, curiously missing from the curriculum. At the time, a cousin of mine started a degree in architecture, and I noted how architecture was not engineering. And it made me wonder if software architecture is not computer science but something else.
Conway’s law says that a software system reflects the organization of the people building it. At the start of my career, I always saw that as a critique. I thought software architecture was about presenting abstractions to applications of the hardware that operating systems managed. But in retrospect, that was not software architecture; that was an extension of software engineering. As time progressed, I thought that software engineering became architecture when the lines of code crossed some critical threshold of size. But as the projects grew, I realized that the people who built it and how effectively they could work together was the most crucial problem I could solve.
And so, Conway’s law is a good thing because it answers what software architecture is. It is how we organize software to enable teams to deliver software. The system’s structure defines what kinds of software we deliver, how we deliver it, and what we deliver. The interfaces we choose, the layers we define, and the APIs create a set of limits of the possible because they constrain how teams communicate. And so, in a genuine sense, software architecture is a multidisciplinary field that tries to answer the question – given the business objectives, the people, the technology we have at our disposal, and the cost, how do we organize the software so teams of people can collaborate efficiently and effectively?
Very early in my software career, the idea of API’s first infrastructure management software was not mainstream. When I proposed to build a new feature with public versioned APIs, several engineers objected because their efficiency was severely constrained. They could build an end-to-end feature in half the time if the APIs were internal and did full stack development. They eventually came around when I explained that the probability of us adding public APIs later to the system was practically zero and that having them would make it easier for us to partner with third parties.
Another realization in my career was about layering. There is a lot of software that attempts to encapsulate other software. The problem is that this assumes the lower layer is static. For example, there was an effort to build heterogeneous storage management in the mid-2000s. My reaction was that this was doomed to fail because the underlying systems were evolving too rapidly. And that the upper layer would be too expensive to maintain. Instead, I argued that any software that uses another piece of software has to assume it’s evolving and changing and allow other entities to directly interact with that lower layer.
Recently, my team and I disagreed on whether we should pursue a strategy of a highly differentiated system first with narrow use cases or a very general system that was not very differentiated. We agreed that we wanted both, but time and resources were a constraint. It soon became apparent that because our sales team could only sell a differentiated product, and because the team wanted to build the differentiated product, we shipped the differentiated one first. Later on, folks wondered why we made the choices we made, and as I explained, what you build is a function of what you should build and what your team wants to build. And looking back, we made the right decision because if we had not done the differentiated system, we would never have built it, and me-too systems are rarely successful.
My favorite example comes from my time in the storage space, where we were tasked with building a product that integrated storage features into the virtualization management consoles. The sales leadership and product teams wanted us to go deep with the storage features we exposed. But having talked to our sales team, I realized they wanted a product with a single feature, snapshot restore of a virtual machine on NFS, iSCSI, and FC. Why? Because they didn’t want to reach the end of a sale and then lose the sale because it wouldn’t work in the environment the customer chose. And they didn’t want to argue with the customer over what storage protocol the customer wanted to use. And what I learned along the way was that the hypervisors could use storage in various ways, and the sales team didn’t want to argue with the customer about that either. So we invested a lot of time to ensure that our feature set worked regardless of storage protocol and how the hypervisor used the storage. But because we knew that the extended feature set was critical, we also ensured that the system had APIs allowing our more sophisticated customers to script solutions.
The key lesson in all these cases, we did things because it was the right people, business, and engineering decision. We did it because we considered how software and non-software teams work together. And over time, I realized that software architecture is the process of creating software that enables more people to do more work together, and that question is relevant to every line of code.