Post

DevOps Essentials Every Software Engineer Should Know

A practical guide to the DevOps concepts every software engineer—yes, even frontend engineers—needs to understand.

DevOps Essentials Every Software Engineer Should Know

DevOps often feels like a mysterious realm reserved for infrastructure engineers and system administrators.

But the reality is simple:

If your code runs on the internet, you are already part of DevOps.

Every software engineer — backend, frontend, or mobile — should understand how their code goes from a Git commit to running in production.

You don’t need to be a Kubernetes expert. But understanding the fundamentals of networking, containers, deployments, and monitoring will make you a significantly better engineer.

In this post, I’ll break down the DevOps concepts every engineer should know — with practical examples.


1. The Real Basics: How Networking Works

Before we talk about deploying apps, we need to understand how they talk to each other. At the foundation is the TCP/IP (Transmission Control Protocol / Internet Protocol) model.

  • IP Addresses: Think of this as the street address of a server.
  • Ports: If the IP is the street address, the port is the specific apartment number (e.g., Port 80 for HTTP, 443 for HTTPS, 5432 for PostgreSQL).
  • TCP: A reliable protocol that ensures data packets arrive correctly and in order. It uses a “three-way handshake” (SYN, SYN-ACK, ACK) to establish a connection before sending data.
  • DNS: The phonebook of the internet. It maps human-readable domains (like google.com) to IP addresses.

Understanding these basics will save you hours of debugging when an API call fails with a “Connection Refused” or “CORS” error.


2. Docker Basics

“It works on my machine!” are famous last words in software engineering. Enter Docker. Docker solves the “works on my machine” problem by packaging applications with all their dependencies.

Key Concepts:

  • Image: A read-only template containing your code, runtime, libraries, and environment variables. Like a blueprint for a house.
  • Container: A running instance of an image. Like the actual built house you can live in.

Sample Dockerfile

Here’s a simple example of containerizing a Node.js application:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 1. Use the official Node.js image as a base
FROM node:18-alpine

# 2. Set the working directory inside the container
WORKDIR /app

# 3. Copy package.json and install dependencies
COPY package*.json ./
RUN npm install

# 4. Copy the rest of the application code
COPY . .

# 5. Build the application (if needed) and Expose the port
EXPOSE 3000

# 6. Command to start the app
CMD ["npm", "start"]

💡 Very High level of docker run command

So, when we run the command docker run my-app-image, Docker performs the following steps:

  1. Pull the image (if not available locally)
  2. Create a container from the image
  3. Start the container –> Here Actually npm start is executed.

So effectively:

Docker Image → becomes → Running Container


3. Deployments Before Kubernetes

Kubernetes is great, but it’s overkill for many projects. Before jumping to K8s, it’s crucial to understand simpler deployment methods: the Single Shared Machine model.

Platforms like EC2 (AWS), Heroku, or Railway allow you to deploy your app onto a virtual machine (VM).

  • Heroku / Railway: Abstract away the underlying servers (PaaS). You push code, and they build and run the Docker container for you effortlessly.
  • EC2: You get a raw virtual server (IaaS). You SSH into it, install Docker, pull your image, and run it.

This model is perfect for MVPs and small-to-medium side projects before scaling horizontally.

Example of deployment on EC2

Steps:

  1. Create EC2 instance
  2. SSH into machine
ssh ubuntu@server-ip

Install Docker

sudo apt install docker.io

Run container

docker run -d -p 80:3000 myapp

Traffic Flow

User
  ↓
Internet
  ↓
EC2 Instance
  ↓
Docker Container
  ↓
Application

4. Enter Kubernetes (K8s)

When your application scales past a few VMs, managing them manually becomes a nightmare. Kubernetes is a container orchestration tool that automates deploying, scaling, and managing containerized apps.

Kubernetes Terminology

  • Cluster: A cluster is simply a group of machines (i.e nodes) working together.
  • Node: A physical or virtual server (e.g., an EC2 instance or a machine) that runs your containers.
    1
    2
    3
    4
    
    Cluster
     ├── Node 1 (EC2)
     ├── Node 2 (EC2)
     └── Node 3 (EC2)
    
  • Pod: The smallest deployable unit in K8s. A Pod usually contains one container (sometimes a few tightly coupled ones).

    In most cases: 1 Pod = 1 Container

    Think of a Pod as:

    A wrapper around your Docker container.

  • Deployment: A declarative way to manage Pods. It ensures the right number of Pods are running at all times. A deployment tells Kubernetes: I want 3 copies of this application running at all times. If one pod crashes, Kubernetes automatically creates a new one.
  • ReplicaSet: A ReplicaSet is the Kubernetes resource responsible for maintaining a stable number of identical Pods. It ensures that the number of running Pods always matches the desired replica count. ReplicaSets are usually not created directly by developers — they are automatically created and managed by Deployments.

    For example:

    1
    2
    3
    4
    5
    
    Deployment
      └── ReplicaSet (replicas = 3)
            ├── Pod
            ├── Pod
            └── Pod
    
  • DaemonSet: Ensures that every Node runs a copy of a specific Pod (often used for logging or monitoring agents). (Will cover this in detail very soon in some blog.)
  • Workloads: A general term for applications running on K8s (Deployments, StatefulSets, DaemonSets).

    This can be confusing initially because Workload is not an actual Kubernetes resource.

    It is simply a category used by Kubernetes to describe objects that run applications, such as Deployments, StatefulSets, and DaemonSets.

    So, when someone says “workloads”, they mean Deployments, StatefulSets, DaemonSets, etc.

    1
    2
    3
    4
    
     Workloads
     ├── Deployment
     ├── StatefulSet
     └── DaemonSet
    
  • ConfigMap / Secret: Ways to pass environment variables and sensitive data to your Pods without hardcoding them in the image.
  • Service: A stable network endpoint that exposes Pods. Since Pods are ephemeral (ip changes every time they are recreated) and can be recreated at any time, Services provide a consistent way to access them.

Official Kubernetes Architecture Diagram

Kubernetes Architecture

A Sample K8s Deployment Manifest

K8s uses YAML manifests to declare the desired state of the system:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: apps/v1
# Kind in k8s YAML is like the type of resource
kind: Deployment  # This is a workload resource
metadata:
  name: frontend-deployment
spec:
  replicas: 3  # This is the ReplicaSet
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: frontend-container
        image: my-repo/frontend:v1.0.0
        ports:
        - containerPort: 80
1
2
3
4
5
6
7
8
9
10
11
apiVersion: v1
kind: Service
metadata:
  name: frontend-service
spec:
  selector:
    app: frontend
  ports:
    - protocol: TCP
      port: 80
      targetPort: 80

Often Confusing Differences

Node vs Cluster
A Node is simply a single machine (physical or virtual) that runs containers.
A Cluster is a group of nodes working together and managed by Kubernetes.

Think of it like this:
  • Node → one server
  • Cluster → a group of servers working together
Example:
    Cluster  
    ├── Node 1 (EC2 instance)  
    ├── Node 2 (EC2 instance)  
    └── Node 3 (EC2 instance)
If one node fails, Kubernetes schedules the pods on another node automatically.
Pod vs Deployment
A Pod is the smallest runnable unit in Kubernetes.
It contains the actual container running your application.
A Deployment is a higher-level controller that manages pods.

For example:
If you declare:
replicas: 3
The Deployment ensures that 3 pods are always running.
If one pod crashes:
Kubernetes automatically creates a new one.

So the relationship looks like this:
Deployment → manages → ReplicaSet → manages → Pods
Containerization vs Orchestration These two terms are often used together but solve different problems.
Containerization (Docker)
Packages your application and dependencies into a portable container.

Example:
Docker container running a Node.js app.

Orchestration (Kubernetes)
Manages many containers across many machines.

It handles things like:
  • scaling
  • restarting failed containers
  • service discovery
  • rolling deployments

Think of it like this:
Docker --> packages the app
Kubernetes --> runs and manages the app at scale

5. Which Tools to Use for K8s / EKS?

Managing K8s via CLI (kubectl) can be overwhelming to visualize. There are some incredible tools to help engineers manage clusters effectively:

  • Lens: Often called the “Kubernetes IDE”. It provides a beautiful, native desktop UI to view Pods, logs, configurations, and cluster metrics instantly. (Tip: If you’re using Lens, check out its intuitive features for port-forwarding and inspecting secrets.)

    Lens

  • k9s: A terminal-based UI to interact with your clusters. If you prefer to never leave your terminal but want a visual dashboard, k9s is unmatched for speed. k9s

Some K9s shortcuts😉
Key Action
:pods View pods
:services View services
:deployments View deployments
l View logs
s Shell into pod
/ Search

6. Infrastructure as Code: Terraform

Terraform allows you to write code to provision infrastructure (like AWS EKS clusters, databases, and network firewalls) instead of clicking through web consoles. It uses HCL (HashiCorp Configuration Language) to declare what resources you want, and Terraform figures out how to create, update, or delete them to match your code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
provider "aws" {
  region = "ap-south-1"
}

resource "aws_s3_bucket" "app_bucket" {
  bucket = "my-app-storage"
}

resource "aws_db_instance" "app_db" {
  identifier = "app-db"

  engine = "postgres"
  instance_class = "db.t3.micro"

  allocated_storage = 20
  username = "admin"
  password = "password123"

  skip_final_snapshot = true
}

Run:

1
2
3
terraform init
terraform plan # shows what will be created
terraform apply # creates the resources
Advanced: Reusable Terraform Modules In larger teams & big companies, engineers don't rewrite terraform resources repeatedly. So, instead they create reusable modules.
Example structure:
terraform-modules/
   ├── s3-module
   ├── rds-module
   └── eks-module

Example usage:
module "app_s3" {
  source = "../modules/s3"
  bucket_name = "team-storage"
}
This allows teams to reuse infrastructure safely.

7. Continuous Integration (CI)

Continuous Integration (CI) is the practice of automatically building and testing your code whenever changes are pushed to a repository.

Instead of manually running builds and tests, CI systems automate the process and provide quick feedback to developers.

Typical CI flow:

1
2
3
4
5
6
7
8
9
10
11
Developer pushes code → GitHub
↓
CI pipeline triggered
↓
Install dependencies
↓
Run tests
↓
Build Docker image
↓
Push image to container registry

One of the most widely used CI tools is Jenkins.

Jenkins runs pipelines defined using a Groovy-based DSL called a Jenkinsfile.

This allows you to version your build pipeline alongside your application code.

Example Jenkins Pipeline (Groovy)

pipeline {
  agent any
  stages {
    stage('Checkout Code') {
      steps {
        git 'https://github.com/org/project.git'
      }
    }

    stage('Install Dependencies') {
      steps {
        sh 'npm install'
      }
    }

    stage('Run Tests') {
      steps {
        sh 'npm test'
      }
    }

    stage('Build Docker Image') {
      steps {
        sh 'docker build -t myapp:${BUILD_NUMBER} .'
      }
    }

    stage('Push Image') {
      steps {
        sh 'docker push myrepo/myapp:${BUILD_NUMBER}'
      }
    }
  }
}

This pipeline automatically builds and pushes a Docker image whenever a commit is made.

Modern alternatives to Jenkins include:

  • GitHub Actions
  • GitLab CI
  • CircleCI
  • Buildkite

8. Helm, FluxCD, and GitOps Automation

Once you have K8s and Terraform, how do you manage updates easily for the whole team?

Helm

Helm is the package manager for Kubernetes. Instead of writing dozens of YAML files (like the deployment manifest above), Helm uses Chart templates. You can install an entire database with a single command: helm install my-db bitnami/postgresql.

Why Helm is Needed

When working with Kubernetes, teams often maintain multiple environments:

  • development
  • staging
  • production

Without Helm, engineers often end up duplicating manifests:

deployment-dev.yaml
deployment-staging.yaml
deployment-prod.yaml
This becomes difficult to maintain.
Helm solves this problem using templates and values.

values.yaml

replicaCount: 3
image: my-app:v1
Deployment template
replicas: {{ .Values.replicaCount }}
image: {{ .Values.image }}

Different environments simply override values.
values-prod.yaml
replicaCount: 10

Deploy using:

helm install my-app ./chart
Helm allows teams to maintain reusable, versioned infrastructure templates.

FluxCD & GitOps

In the past, engineers manually ran deployment scripts. Today, we use GitOps. Tools like FluxCD (or ArgoCD) run inside your K8s cluster and constantly monitor your Git repository. When you merge a PR to GitHub, FluxCD instantly updates the K8s cluster to match the new code.

Self-Serve Architecture for Teams

In larger teams, infrastructure should not become a bottleneck where developers constantly depend on DevOps engineers for deployments.

A common approach is to adopt a GitOps-driven self-serve architecture.

Typical repository structure:

1
2
3
4
5
6
7
repo/
├── terraform-resources/
│ ├── s3
│ ├── networking
│ └── rds
└── k8s-resources/
    ├── helm-charts

Infrastructure resources such as clusters, networks, and databases are managed using Terraform.

Application-level deployments are managed using Kubernetes manifests or Helm charts.

Developers typically work with long-lived branches such as:

  • staging
  • regression
  • production

When changes are merged into these branches, FluxCD continuously monitors the repository.

If Flux detects changes in Kubernetes manifests or Helm charts, it automatically synchronizes those changes to the Kubernetes cluster.

This ensures:

  • Git becomes the single source of truth
  • Deployments are automated
  • Infrastructure changes remain auditable

Gitops Working

9. Observability: How the ELK Stack Works

Deploying code is only half the battle. Knowing what it’s doing in production is the other half. The ELK Stack is the industry standard for centralized logging:

  1. E - Elasticsearch: A powerful search engine that stores your logs.
  2. L - Logstash (or fluentd/fluentbit): The data pipeline that collects logs from your K8s Pods, parses them, and sends them to Elasticsearch.
  3. K - Kibana: The UI dashboard where you can filter, search, and visualize your application logs.

When your app throws error 500s, you go to Kibana to read the exact stack trace.


10. Metrics: Prometheus, Grafana & Time Series Databases

While ELK handles logs (text), Prometheus and Grafana handle metrics (numbers over time).

  • Time Series Database (TSDB): A database optimized for storing data timestamp by timestamp. Perfect for “CPU usage per second”.
  • Prometheus: It scrapes metrics from your applications and stores them in its TSDB.
  • Grafana: Plugs into Prometheus and visualizes the data via beautiful charts and gauges.

How to create dashboards: In Grafana, you write PromQL (Prometheus Query Language) queries (e.g., rate(http_requests_total[5m])) to pull data and put it on a graph. You can then set alerts to ping your team’s Slack if CPU usage exceeds 90%.

While ELK handles logs (text), Prometheus and Grafana handle metrics (numerical data over time).

How Metrics Collection Works

The monitoring flow typically looks like this:

1
2
3
4
5
6
7
8
9
Application 
   ↓
/metrics --> Prometheus scrapes metrics 
   ↓
Metrics stored in TSDB 
   ↓
Grafana queries Prometheus 
   ↓
Dashboards visualize data

Step 1 - Expose Metrics

Applications expose a /metrics endpoint.

Example output:

1
2
3
http_requests_total 10234
http_request_duration_seconds 0.34
cpu_usage_percent 45

Step 2 - Prometheus Scrapes Metrics

1
2
3
4
5
6
7
8
Prometheus periodically scrapes (pulls) metrics from the `/metrics` endpoint.

Prometheus configuration:

scrape_configs:
  - job_name: 'myapp'
    static_configs:
      - targets: ['myapp:8080']

Prometheus stores:

  • recent data in memory
  • long-term data in a TSDB (Time Series Database)

Example TSDB systems

Common time series storage systems include:

  • Prometheus TSDB
  • VictoriaMetrics
  • Thanos
  • Cortex

These systems allow long-term storage and horizontal scaling of metrics.

Step 3 - Grafana Visualizes Data

1
2
3
4
5
6
7
8
9
10
Grafana queries Prometheus using PromQL:

rate(http_requests_total[5m])

Creates dashboards with:

- Line charts
- Gauges
- Heatmaps
- Alerting rules

11. Advanced Deployments: Canary vs. Blue/Green

“Push and pray” is dangerous. Modern deployment strategies minimize downtime and risk:

Blue/Green Deployment

  • You have two identical environments: Blue (currently live) and Green (idle).
  • You deploy version 2.0 to the Green environment.
  • Once Blue is fully tested and ready, you flip the load balancer switch so all user traffic instantly goes to Green. Blue becomes idle.
  • Benefit: Immediate rollback capability by flipping the switch back.

Canary Deployment

  • You route a small percentage of traffic (e.g., 5%) to the new version (the “Canary”).
  • The remaining 95% stay on the stable version.
  • You monitor the error rates and metrics on the Canary. If stable, you gradually increase traffic (10%, 25%, 50%, 100%).
  • Benefit: Lowest risk. Bugs only impact a small subset of users before being caught.

12. CDN and Edge Infrastructure

When users access your application, every request doesn’t need to hit your origin server.

This is where a CDN (Content Delivery Network) comes into play.

A CDN is a globally distributed network of servers that caches and delivers content closer to users.

Instead of:

User → Origin Server

It becomes:

User → CDN Edge Server → Origin Server

If the content is cached at the edge, the request never reaches your origin server.

Benefits of using a CDN:

  • Faster content delivery
  • Reduced load on your servers
  • Built-in DDoS protection
  • Edge caching of static assets
  • TLS termination
  • Web Application Firewall (WAF)

One of the most popular CDN providers is Cloudflare.

Conclusion

DevOps isn’t a single person’s job—it’s a culture and a set of practices. By understanding these concepts—networking, containerization, orchestration, and observability—you bridge the gap between “code completing” and “code delivering value to users reliably.”

Next time you see a Dockerfile or a .yaml manifest in your repository, you’ll know exactly what’s going on!

This post is licensed under CC BY 4.0 by the author.