Sonatype

Out of the Wild: A Beginner’s Guide to Package and Dependency Management

Key Takeaways

In our package and dependency management guide you’ll learn:

  • The role of package managers, their corresponding language-specific registries, and universal package managers in modern software development.
  • The key components of an application-level package (or dependency) manager and how they work, including examples for 3 different programming languages.
  • How a universal package manager (i.e., binary repository manager) differs from language-specific registries/repositories and source control management repositories, and why it’s a critical piece in your DevOps toolchain.

What do we mean when we say Package and Dependency Management?

Terms like package manager, dependency management, repository, and repository manager are thrown around a lot in software development. Most people have at least a vague understanding of their meanings, but sometimes it’s hard to know if we’re all speaking a common language, with a common understanding, when these discussions arise.

Let’s get to the heart of what we mean when we talk about these terms in the context of DevOps.

Keeping in mind the definition of DevOps we arrived at in our own What is DevOps? article, as a “discipline rooted in collaboration and communication,” with “a common goal of shortening software delivery cycles and improving the stability of deployments,” there are many different concepts, practices, and toolsets that organizations can leverage to help enable those goals.

Some of the most common DevOps concepts and related tooling include Source Control Management (SCM) solutions like GitHub, CI/CD servers like Jenkins or Bamboo for automating stages of your software development lifecycle (SDLC), automated infrastructure configuration management tooling like Anisble, Terraform, Chef or Puppet, and containerization and orchestration tools like Docker and Kubernetes.

But there is another equally important DevOps concept, practice, and related toolset that is talked about less often than those mentioned above. Olivia Glenn-Han talks about this lesser-discussed topic in her article, The Universal Package Manager – The Most Critical Link in Your DevOps Toolchain. The Universal Package Manager can be a key component in helping “further the technical and cultural goals of DevOps” in your organization.

So let’s dive deeper into the concept and practice of package and dependency management—and the toolset that helps enable them.

Not That Type of Package Manager

First, let’s nail down what we mean by package manager for the purposes of this guide. When people say “package manager,” it’s not always clear what type is being talked about until you have some additional context. Though important, for the purposes of this guide, we’re not talking about OS or system-level package managers/installers, like Homebrew for MacOS or RPM for Linux.

What we’re talking about is package managers that operate as application-level dependency managers, their corresponding language-specific package registries, and universal package managers, and how they all work together.

It’s important to note here that many of the concepts we’ll discuss can be applied to system-level package managers as well, but the examples in the rest of this guide will focus on application-level package managers.

Application-level Package (or Dependency) Managers

So, what are application-level package/dependency managers?

In his Medium article So You Want to Write a Package Manager, Sam Boyer distinguishes an application-level package manager as “an interactive system for managing the source code dependencies of a single project in a particular language.”

Examples of application-level package managers include:

Boyer goes on to say that application-level package managers provide “collective coherency” and an output that is “precisely reproducible.”

So let’s talk about why the phrases and descriptions above about the “managing,” “coherency,” and “reproducibility” of dependencies are so important. In other words, why do we need these application-level dependency managers?

As we’ve discussed in other articles, the role of software developer has changed significantly in recent years, and the reliance on open source software to build modern applications only continues to increase. This means that the applications we develop often depend on other people’s code.

In fact, according to Sonatype’s State of the Software Supply Chain Report, a modern application is made up of more than 90% OSS components. With this reliance on third-party dependencies to build software, things can get messy quickly, especially when a direct dependency pulls in another component, resulting in nested transitive dependencies.

Managing this intricate web of dependencies out in the wild, unassisted, is no small task. Here is where application-level package managers can help:

“Thus, to build our software we need to bring in all parts on which it depends, including language libraries and remote third-party modules. But it’s not trivial to ensure that we have all necessary dependencies, particularly when dependencies themselves depend on others. This is why we need a Dependency Manager, often invoked during the software build process.” (Devopedia)

Boyer goes on to explain more of the complexity that comes into play:

“There is a natural tension between the need for absolute algorithmic certainty of outputs, and the fluidity inherent in development done by humans. That tension, being intrinsic and unavoidable, demands resolution. Providing that resolution is fundamentally what [application-level package managers] do.”

Application-level package managers are often closely linked to an online repository (also called registry) that stores packages (also referred to as libraries) for that particular programming language. For example, Maven by default sources components from the Central Repository for Java (and other languages) that Sonatype manages, and npm pulls packages from the Javascript registry at npmjs.org.

How do Application-level Package Managers Work?

We’ve established that managing dependencies is a complex task. But as Boyer explains in his Medium article, “It’s not the algorithmic side that makes [application-level package managers] hard.”

“Their final outputs are phase zero of a compiler or interpreter, and while the specifics of that vary from language to language, each still presents a well-defined target. As with many problems in computing, the challenge is figuring out how to present those machine requirements in a way that fits well with human mental models.”

Take the Apache Maven application-level package manager as an example. Its “primary goal is to allow a developer to comprehend the complete state of a development effort in the shortest period of time” by focusing on “making the build process easy” and “providing quality project information.” In fact, the term Maven itself comes from the Yiddish word meaning “accumulator of knowledge,” which is based on their Project Object Model, or POM file. (More on this later.)

That’s just one application-level package manager’s take on their high-level role in modern software development, but let’s continue down this path for a minute and talk about what application-level package managers (in general) have in common.

We’re simplifying a bit, but below are the key components used by most application dependency managers. It’s the interplay—and forward “movement”—of the elements listed below that makes for an effective dependency management system.

Project code

This one’s easy. First, you have your source code that’s being actively developed; that is, the project you want the application-level package manager to manage dependencies for. This is usually stored in a Source Code Manager (SCM), such as GitHub.

Manifest file

A manifest is a file, specific to your particular application-level package manager, that you create to list the dependencies necessary for your project. It nails down your intent, such as using version 1.4 and above for Package X.

Lock file

A lock file, then, is machine-generated from the manifest, and it contains the actual dependencies and versions that the application-level package manager resolved from the manifest file as the project was being built. It basically contains all of the information necessary to reproduce the project’s dependencies.

Dependency code

Next, the dependency code is then generated, containing all of the source code and/or binary files as listed in the lock file, and “arranged on disk such that the compiler/interpreter will find and use it as intended, but isolated so that nothing else would have a reason to mutate it.” – Boyer

Image Credit: Boyer, 2016: So You Want to Write a Package Manager.

Devopedia provides a good explanation of how the process works using these application-level package manager components:

“Dependency managers start by reading the manifest file, in which direct dependencies are noted. They then read the metadata of these dependencies from their repositories to figure out the next level of dependencies. In other cases, they may download the dependencies right away and then process their dependencies. Either way, all dependencies must be downloaded and installed.”

A key concept in both that Devopedia blurb and the topic as a whole is the notion of direct and transitive dependencies. A direct dependency is a component that’s required for the code to function as intended. Developers outlined their dependencies in the manifest file. But those dependencies may have dependencies of their own. These transitive dependencies are outlined by the component’s devs in a seperate manifest file.

Application-level Package Manager Examples

Maven Example

As we briefly mentioned earlier, the Maven application-level package manager is based on a Project Object Model (POM). The pom.xml is Maven’s take on a manifest file, including all of the necessary information to build a Java application.

According to Maven’s docs, “the cornerstone of the POM is its dependency list.” When your project is compiled, Maven downloads and links your OSS dependencies, including “the
dependencies of those dependencies (transitive dependencies), allowing your list to focus solely on the dependencies your project requires.”

Here is an example snippet from the Dependencies subsection of the docs:

 <dependencies>
   <dependency>
     <groupId>junit</groupId>
     <artifactId>junit</artifactId>
     <version>4.12</version>
     <type>jar</type>
     <scope>test</scope>
     <optional>true</optional>
   </dependency>
   ...
 </dependencies>

For more information on the groupID, artifactId, version, type, and other parameters that make up the Dependencies section of the pom.xml, see Maven’s docs.

You may have noticed that we haven’t mentioned a lock file for Maven yet. And that’s for a reason. In the Maven package manager, according to this StackOverflow thread, “There is no need to have a feature such as ‘lock file’, or anything like this if your pom.xml strictly defines the versions of your dependencies.”

So, if you declare a specific version in your pom.xml, Maven will only resolve that version; therefore, your pom.xml becomes both a manifest and a lock file.

There are differing schools of thought around how to specify your dependency versions in any application-level package manager’s manifest, but if you lean toward the school of thought that favors version specificity (i.e., avoiding version ranges) as part of enabling reproducible builds, the lock file becomes unnecessary. Reproducible, or deterministic, builds are increasingly seen as a best practice within software development.

npm Example

In the case of the node package manager (npm), the package.json file serves as the project manifest. For Javascript developers using npm, the use of version ranges when specifying their dependencies is pretty common, likely because it’s mentioned explicitly in the docs (but also because it’s less maintenance and enables faster updating of dependencies), and so lock files in the form of a package-lock.json or npm-shrinkrap.json are used to document the exact versions that were ultimately used in the build process. (Note that even with the “manifestation of the manifest” documenting the package versions used in the lock file, there are certain risks with specifying “latest” or version ranges for your dependencies, and we’ll discuss that a bit more later on.)

Here is a package.json snippet from the npm docs showing an example:

{
  "name": "my_package",
  "version": "1.0.0",
  "dependencies": {
    "my_dep": "^1.0.0",
    "another_dep": "~2.2.0"
  }
}

.NET Example

In the case of the NuGet application-level package manager used by .NET developers, a .nuspec file is used as the manifest.

Here is an example .nuspec snippet with dependencies specified:

  <dependencies>
      <dependency id="another-package" version="3.0.0" />
      <dependency id="yet-another-package" version="1.0.0" />
  </dependencies>

In addition, lock file functionality for NuGet was somewhat recently introduced, for NuGet.exe versions 4.9 and above.

And then there’s this other type of package manager…

So far we’ve learned what application-level package managers are, as well as a very simplified, high-level view of how they work to manage the OSS dependencies we use in our software. We’ve also noted that they work closely with their programming language-specific repositories/registries such as npmjs.org or pypi.org, downloading the applicable OSS libraries as needed and resolving dependency conflicts. And we’ve looked at examples of the manifests and lock files used for three different application-level package managers.

So…now what? Once the components that make up your application are downloaded, and your machine understands how to arrange them using your application-level package manager, where do the components and “built” artifacts go?

Olivia Glenn-Han discusses this missing link in her article, The Universal Package Manager – The Most Critical Link in Your DevOps Toolchain:

“This shift from a monolithic application code base, to applications built on 100s of smaller parts, has directly led to a dramatic decrease in release times, as well as the advent of philosophies like Continuous Delivery, and DevOps. One of the biggest things that is still neglected, is how to properly store, and access these pieces.”

Enter the universal package manager (a.k.a., binary Repository Manager):

“Also known as binary repository manager, it is a software tool designed to optimize the download and storage of binary files, artifacts and packages used and produced in the software development process. These package managers aim to standardize the way enterprises treat all package types. They give users the ability to apply security and compliance metrics across all artifact types. Universal package managers have been referred to as being at the center of a DevOps toolchain.” (Wikipedia)

So, why do I need a Binary Repository Manager?

Binary repository managers serve a couple of important functions as part of a modern software development lifecycle.

First, they can serve as a local copy, or “proxy,” repository for the language-specific package repositories/registries we discussed earlier. Creating these proxy repositories in a repository manager to store and cache your OSS components locally—rather than downloading them directly from an online repository every time you kick off a build—can provide some of the following benefits, as stated in our own Repository Management Basics course:

  • Increasing build performance due to a wider distribution of software and locally available parts.
  • Reducing network bandwidth and dependency on remote repositories.
  • Insulating your company from outages on the internet, outages of public repositories (Maven Central, npm, etc.), or even removal of an open-source component.

In addition, repository managers serve as a “single source of truth” for the binaries used in your build processes.

At this stage, you may be asking yourself, but why can’t I just store my binaries where I store my source code? And the short answer is that you can. But you probably won’t want to after you understand more about how version or source control tools like GitHub differ from binary repository managers…

I use a Version/Source Control Management repository to store my source code. Why do I need a Repository Manager for my binaries?

As DZone’s Refcard on Using Repository Managers concisely states, “Repository Managers are to binaries what source repositories or VCS (Version Control Systems) are to sources.”

Authors Brian Fox and Carlos Sanchez go on to explain that binary files are much larger in size, and need a lot of metadata stored with them, such as package name, version, license, etc. They also don’t need to be diffed or cloned in the way that source code does.

Because of these differences, an artifact repository makes a lot more sense for storing binaries, whether they’re the outputs of your build (.zip, .jar, .war, etc.), packages downloaded from an online registry, Docker images, etc.

This thread on StackOverflow also provides some clarification on how the two tools differ:

“In everyday use, you’d store your source code and its history in a git repository, and store your build artifacts (e.g. the compiled software you want to deliver) in Nexus.”

Put more succinctly: “You manage what you code in Git, and what you build in Nexus.”

So while proxy repositories are the best method to store open-source packages downloaded from online registries as we mentioned earlier, hosted repositories can serve as a means to store your internal build artifacts, including snapshots and releases.

Lastly, another advantage that repository managers provide is risk reduction in your build process. We alluded earlier to opening yourself up to certain risks when specifying the “latest” versions of a particular dependency, or even a version range, in your application-level package manager’s manifest. Downloading unvetted versions directly from online registries presents more risk because bad actors are increasingly poisoning the well, injecting malicious code into libraries or removing them all together.

As Mykel Alvis explained in his Nexus User Conference presentation, the ability to insulate yourself from outages or vulnerabilities that may occur in such cases is made possible by the use of a caching repository manager.

Putting it all together

Looking at the diagram below, you can see how the application-level package managers (invoked at the developer and CI circles) and their corresponding registries/repositories (top left), source control management systems (bottom), and a binary repository manager (top/right) all work together as part of a modern software development process. Continuous integration can also easily be added to the mix to further your organization’s DevOps goals.

diagram of package managers and repository managers in a DevOps Pipeline

Further Learning

Repository Management Basics (Course)
This course is designed to provide new customers with the first steps toward optimizing their Nexus Repository Manager configuration. Specifically, it provides critical, high-level theory, best practice, and practical application related to understanding specific concepts and terminology related to Nexus Repository Manager.

Nexus Repository Manager – Proxying Maven and npm Quick Start (Guide)
If you’re new to repository management with Nexus Repository Manager 3, use this guide to get familiar with configuring the application as a dedicated proxy server for Maven and npm builds. To reach that goal, follow each section to:

  • Install Nexus Repository Manager 3
  • Run the repository manager locally
  • Proxy a basic Maven and npm build

Go Dependencies in Nexus Repository (Guide)
This guide will give you fundamentals on dependency management with Go modules. Modules were added to the Go ecosystem to give you built-in versioning and dependency management. Now you and your fellow developers can adapt Go software development to Nexus Repository. Use this guide to get an understanding of the Go toolset, version control, and environment configuration.

Sources

Dependency Manager entry (Devopedia)

Difference between Git and Nexus? (StackOverflow thread)

Open Source Developers and Infrastructure Are The New Frontline of Security by Brian Fox

Package Manager entry (Wikipedia)

Repository Management Basics (Sonatype Learn Course)

Repository Management: An Easy Way to Reduce Risk by Katie McCaskey and Mykel Alvis

So you want to write a package manager by Sam Boyer

The Universal Package Manager – The Most Critical Link in Your DevOps Toolchain by Olivia Glenn-Haan

Using Repository Managers (DZone Refcard) by Brian Fox and Carlos Sanchez

Talk to Us

Have more questions or comments? Learn more at help.sonatype.com, join us in the Sonatype Community, and view our course catalog at learn.sonatype.com.

And visit my.sonatype.com for all things Sonatype.

Ember DeBoer

Written By: Ember DeBoer

Ember is the Director of Customer Experience at Sonatype.