Blog Posts

1) Buy a VPS Server

for setting up django web app in centos/Rhel vps system you have to buy vps server from vps provider it will give you server ip address and root password  also you will get  client web gui for management of server from that you get access to  reset server ,reinstall server or boot sever into safe mode and  from there you can even change the root password

2) ssh the server

now ssh the sever from any ssh client like putty ,cmd or mobxtrem,etc.

ssh root@192.168.47.101 ---> give root password of server

after that you will get the full access that server

3)Change the root password

once you login to ssh server you should change  your root password by passwd commnad

passwd root --> give new password two times

4) add a new user

add a new user so you dont needed to use root for all the times it will make your server more secure 

useradd test  

passwd test -->  give new password two times

now add that test user to wheel group so you can use sudo command by sudo command you can access all system  command by that new user 

usermod -G wheel test

5) Configure ssh

configure ssh so  no one can login to the system via root  , edit  ssh configuration file and write PermitRootLogin to No

 

vim   /etc/ssh/sshd_config

       PermitRootLogin to No

now reboot the system and login via ssh  by ssh user@server

 

6)update the system 

now update the system then reboot the system

e.g yum update && reboot

7)Install python3

after sucessfully update install python3

yum install python3

8)install and activate virtual enviroment

after installation of python3 we can use python pip package installer by using pip we can install and create virtual enviroment

by using virtual enviroment we can install python packages virtually that means they only available when we activate the virtual enviroment 

 

pip3 install virtualenv

9)activate virtual enviroment

for activation of virtual enviroment  create new directory use source command to activate it

mkdir /djangoenv

virtualenv /djangoenv

source /django/bin/activate

10)install python packages

first create the requirement.txt ,in that file write list of names of python packages

now install it by 

pip install -r requirements.txt

 

11) install sqlite3.8 

for django to work in python3  in centos it need the sqlite version 3.8

wget https://kojipkgs.fedoraproject.org//packages/sqlite/3.8.11/1.fc21/x86_64/sqlite-3.8.11-1.fc21.x86_64.rpm

sudo yum install sqlite-3.8.11-1.fc21.x86_64.rpm

 

12)copy django application to server

copy django application to server by using scp or sftp

scp djangoapplication.zip user@server

 

13)test the django application

python3 manage.py runserver

 

14)install apache webserver

install apache webserver

yum install httpd*

 

15) configure apache server

go to /etc/httpd/conf.d 

create web.conf file

type configuration

<virtual host *:80>

servername yoursitename

alias /static  /your static directory in django application

alias /media /your media directory in django application

<directory  /your static direcory >

require all granted

</directory>

<directory  /your media directory >

require all granted

</directory>

<directory /your main app directory where wsgi file located>

<files wsgi.py>

require all granted

</files>

</directory>

WSGIDaemonProcess name_of_your_project python-path = /directory of project

python-home = /directory of enviroment variables'

WSGIProcessGroup  name_of_project

WSGIScriptAlias / /your django app directory

</virtualhost>

save and exit

if there is numpy python package in your django app then it will gives time out so add following line to

/etc/httpd/conf/httpd.conf

WSGIApplicationGroup % {GLOBAL}

save and exit

 

16)Restart apache server

test apache config by 

apachectl configtest

restart apache server 

systemctl restart httpd

 

17)go to web browser check your site 

Done.

Create Replication Mysql

by abhilash - Oct. 7, 2025 coding

mysqld --defaults-file=/etc/my.cnf --initialize --user=mysql

In 8.4 Community, the host cache is managed internally.
FLUSH HOST DEPRICATED --> SELECT * FROM performance_schema.host_cache;


CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
FLUSH PRIVILEGES;

CHANGE REPLICATION SOURCE TO
  SOURCE_HOST='192.168.241.101',
  SOURCE_USER='repl',
  SOURCE_PASSWORD='password',
  SOURCE_PORT=3360,
  SOURCE_AUTO_POSITION=1,
  SOURCE_SSL=1,
  SOURCE_SSL_CA='/data/ca.pem';

 

 

slave my.cnf

 


[mysqld]
datadir=/data/mysql_server
socket=/var/lib/mysql/mysql.sock
log-error=/data/mysql_server/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

server-id=2
log_bin=mysql-bin
binlog_format=ROW
gtid_mode=ON
enforce_gtid_consistency=ON
get_source_public_key=1

port=3360
user=mysql
symbolic-links=0

# Connection limits (safe for low-memory VM)
max_connections=50
max_user_connections=50

# Packet and temporary table sizes
max_allowed_packet=16M
tmp_table_size=32M
max_heap_table_size=32M

# Sorting and read buffers (per connection, smaller for low RAM)
sort_buffer_size=2M
read_buffer_size=2M
read_rnd_buffer_size=4M
join_buffer_size=2M

# Storage engine
default-storage-engine=InnoDB
key_buffer_size=8M
bulk_insert_buffer_size=8M

# InnoDB settings for small memory
innodb_log_file_size=32M
innodb_print_all_deadlocks=1
innodb_buffer_pool_instances=1
innodb_buffer_pool_size=512M
innodb_read_io_threads=4
innodb_write_io_threads=4
innodb_thread_concurrency=0
innodb_io_capacity=100
innodb_log_buffer_size=8M
innodb_flush_log_at_trx_commit=2
innodb_lock_wait_timeout=50

# Transaction isolation
transaction-isolation=READ-COMMITTED

[root@node2 data]#
 

============================

 

master my.cnf

 


[mysqld]
datadir=/data/mysql_server
socket=/var/lib/mysql/mysql.sock
log-error=/data/mysql_server/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid

server-id=1
log_bin=mysql-bin
binlog_format=ROW
gtid_mode=ON
enforce_gtid_consistency=ON

plugin-load-add=mysql_native_password.so

port=3360
user=mysql
symbolic-links=0

# Connection limits (safe for low-memory VM)
max_connections=50
max_user_connections=50

# Packet and temporary table sizes
max_allowed_packet=16M
tmp_table_size=32M
max_heap_table_size=32M

# Sorting and read buffers (per connection, smaller for low RAM)
sort_buffer_size=2M
read_buffer_size=2M
read_rnd_buffer_size=4M
join_buffer_size=2M

# Storage engine
default-storage-engine=InnoDB
key_buffer_size=8M
bulk_insert_buffer_size=8M

# InnoDB settings for small memory
innodb_log_file_size=32M
innodb_print_all_deadlocks=1
innodb_buffer_pool_instances=1
innodb_buffer_pool_size=512M
innodb_read_io_threads=4
innodb_write_io_threads=4
innodb_thread_concurrency=0
innodb_io_capacity=100
innodb_log_buffer_size=8M
innodb_flush_log_at_trx_commit=2
innodb_lock_wait_timeout=50

# Transaction isolation
transaction-isolation=READ-COMMITTED
 

Secure Monitoring Setup – Prometheus + Node Exporter + Grafana (HTTPS + TLS + Django Embed)

OS: Rocky Linux 8.10 / RHEL compatible
Container runtime: Podman
TLS: Let’s Encrypt or self-signed
Grafana HTTPS port: 42923

Goal:
– node exporter & prometheus not public
– only localhost access
– grafana SSL enabled on random port
– disable anonymous login
– embed inside Django site
– force login authentication

=======================================

Create directories

mkdir -p /opt/node
mkdir -p /opt/prometheus
mkdir -p /opt/grafana/certs

=======================================
2) Copy SSL certificates for Grafana

Use Let’s Encrypt certs (recommended)

cp -L /etc/letsencrypt/live/www.yourdomain.com/fullchain.pem /opt/grafana/certs/fullchain.pem
cp -L /etc/letsencrypt/live/www.yourdomain.com/privkey.pem /opt/grafana/certs/privkey.pem

Fix permissions

chmod 640 /opt/grafana/certs/privkey.pem
chmod 644 /opt/grafana/certs/fullchain.pem

SELinux label

chcon -Rt container_file_t /opt/grafana/certs

=======================================
3) Create Node Exporter HTTPS config file

cat > /opt/node/web.yml <<EOF
tls_server_config:
cert_file: /certs/server.crt
key_file: /certs/server.key
EOF

If using self-signed:

copy cert files

cp /opt/ssl/selfsigned/server.crt /opt/node/
cp /opt/ssl/selfsigned/server.key /opt/node/

=======================================
4) Run Node Exporter (localhost only + HTTPS)

podman run -d
--name node_exporter
-p 127.0.0.1:9100:9100
-v /opt/node/web.yml:/etc/node/web.yml:Z
-v /opt/ssl/selfsigned/server.crt:/certs/server.crt:ro,Z
-v /opt/ssl/selfsigned/server.key:/certs/server.key:ro,Z
quay.io/prometheus/node-exporter
--web.config.file=/etc/node/web.yml
--web.listen-address=127.0.0.1:9100

Test

curl -k https://127.0.0.1:9100/metrics

=======================================
5) Create Prometheus config

cat > /opt/prometheus/prometheus.yml <<EOF
global:
scrape_interval: 5s

scrape_configs:

job_name: 'node'
scheme: https
tls_config:
insecure_skip_verify: true
static_configs:

targets: ['127.0.0.1:9100']
EOF

=======================================
6) Run Prometheus container

podman run -d
--name prometheus
-p 127.0.0.1:9090:9090
-v /opt/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:Z
prom/prometheus

Test local access

curl -k https://127.0.0.1:9090/api/v1/targets

You should see "health: up"

=======================================
7) Run Grafana HTTPS on port 42923

Remove any old container first

podman stop grafana || true
podman rm grafana || true

Run new

podman run -d
--name=grafana
--net=host
-v grafana:/var/lib/grafana
-v /opt/grafana/certs/fullchain.pem:/certs/fullchain.pem:ro,Z
-v /opt/grafana/certs/privkey.pem:/certs/privkey.pem:ro,Z
-e GF_SERVER_PROTOCOL=https
-e GF_SERVER_HTTP_PORT=42923
-e GF_SERVER_CERT_FILE=/certs/fullchain.pem
-e GF_SERVER_CERT_KEY=/certs/privkey.pem
-e GF_AUTH_ANONYMOUS_ENABLED=false
-e GF_USERS_ALLOW_SIGN_UP=false
-e GF_SECURITY_ALLOW_EMBEDDING=true
grafana/grafana:latest

Test

curl -k https://127.0.0.1:42923

Expected:

<a href="/login">Found</a>

Browser URL

https://www.yourdomain.com:42923

=======================================
8) First Grafana login

Default user:

admin / admin

Grafana will force you to CHANGE PASSWORD

Create user “xxxx” later in settings.

=======================================
9) Import dashboard ID 1860

In Grafana:
Dashboards → Import
Dashboard ID: 1860
Select Prometheus datasource

=======================================
10) Django embed setup

Create app: serverstats

views.py

from django.contrib.auth.decorators import login_required
from django.shortcuts import render

@login_required
def serverstats_home(request):
return render(request, "serverstats/home.html")

Template home.html

<h2>Server Monitoring Dashboard</h2> <iframe src="https://www.yourdomain.com:42923/d/xxxx?orgId=1&refresh=30s" width="100%" height="900" frameborder="0"> </iframe>

urls.py entry

path("serverstats/", serverstats_home, name="serverstats")

=======================================
11) Security notes

Node exporter is local only ✔
Prometheus is local only ✔
Grafana HTTPS enforced ✔
Random port 42923 ✔
Anonymous Grafana disabled ✔
Django auth required ✔
TLS everywhere ✔

update soon

Docker Basics

by abhilashthale - March 12, 2026 coding

docker basics

Day 1: Install Docker & Practice Container Commands

Install Docker

sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install -y docker-ce docker-ce-cli containerd.io
sudo systemctl enable --now docker

Add user to docker group

sudo usermod -aG docker $USER

Test docker

docker run hello-world

Run Docker with IPv4 only

Create file:

/etc/docker/daemon.json

Add configuration:

{
 "ipv6": false,
 "dns": ["8.8.8.8","1.1.1.1"]
}

Restart docker

systemctl restart docker

Verify

docker run hello-world

Docker Basics

Pull Images

docker pull nginx
docker pull ubuntu
docker pull mysql

Run Container

docker run -d --name mynginx -p 8080:80 nginx

Explanation

-d = run in background
--name mynginx = container name
-p 8080:80 = map container port 80 to host port 8080

Enter Container

docker exec -it mynginx bash

Stop / Start Containers

docker stop mynginx
docker start mynginx
docker restart mynginx

View Logs

docker logs mynginx
docker logs -f mynginx

Remove Containers

docker stop mynginx
docker rm mynginx

Remove all stopped containers

docker container prune

Port Mapping Example

docker run -d --name mynginx -p 8081:80 nginx

Container port 80 → Host port 8081


Volumes (Persistent Data)

Create configuration file

mkdir -p /data/docker
touch /data/docker/my.cnf

chown 999:999 /data/docker/my.cnf
chmod 644 /data/docker/my.cnf

Run MySQL container

docker run -d \
--name mydb \
-e MYSQL_ROOT_PASSWORD=pass123 \
-v /data/mysql:/var/lib/mysql \
-v /data/docker/my.cnf:/etc/mysql/conf.d/my.cnf \
-p 3306:3306 \
mysql:latest

Build Your Own Docker Image

Create Dockerfile

cat > Dockerfile <<EOF
FROM mysql:8.0

LABEL maintainer="you@example.com"

COPY my.cnf /etc/mysql/conf.d/my.cnf

RUN apt-get update && apt-get install -y \
    vim \
    net-tools \
 && apt-get clean
EOF

Useful Docker Commands

docker images        # list images
docker ps            # list containers
docker stats         # container resource usage
docker logs -f name  # view logs
docker inspect name  # container config
docker exec -it name bash  # enter container
docker top name      # processes inside container
docker system df     # disk usage

Run Container With CPU & Memory Limits

docker run -d \
--name mydb5 \
-e MYSQL_ROOT_PASSWORD=pass123 \
-p 3308:3306 \
--memory="1g" \
--cpus="1.5" \
mysql:8.0

Verify limits

docker inspect mydb5 | grep -i -E "memory|cpus"

Live resource usage

docker stats mydb5

Check inside container

docker exec -it mydb5 cat /sys/fs/cgroup/memory/memory.limit_in_bytes
docker exec -it mydb5 cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
docker exec -it mydb5 cat /sys/fs/cgroup/cpu/cpu.cfs_period_us

Restart Policies

docker run -d \
--name mydb10 \
--restart=always \
-e MYSQL_ROOT_PASSWORD=pass123 \
-p 3310:3306 \
mysql:latest

Restart options

no
on-failure
always
unless-stopped

Docker Networking

Default network: bridge

Create network

docker network create mynet2
docker network inspect mynet2

Run container on network

docker run -d \
--name mydb12 \
--restart=always \
-e MYSQL_ROOT_PASSWORD=pass123 \
-p 3312:3306 \
--network mynet2 \
mysql:latest

Connect container to multiple networks

docker network connect mynet mydb9
docker network connect mynet1 mydb9
docker network connect mynet2 mydb9
docker network connect mynet3 mydb9

Network types

bridge   : containers on same host
host     : use host network
none     : isolated container
overlay  : multi-host cluster networking

Docker Health Checks

Example

docker run -d \
--name web2 \
--restart unless-stopped \
--health-cmd="curl -f http://localhost:80 || exit 1" \
--health-interval=30s \
--health-retries=3 \
--health-timeout=5s \
nginx

Exit codes

0 = success
1 = failure
command missing = failure

Copy Files From Container

docker cp temp-tomcat:/usr/local/tomcat/conf /data/mytomcat

Production MySQL Container Example

docker run -d \
--name mydb16v8 \
-e MYSQL_ROOT_PASSWORD=pass123 \
-e TZ="Asia/Kolkata" \
--restart unless-stopped \
--health-cmd="mysqladmin ping -h localhost -u root --password=pass123 || exit 1" \
--health-interval=5s \
--health-retries=5 \
-v /data/mysql16v8:/var/lib/mysql \
-v /data/docker/my.cnf:/etc/mysql/conf.d/my.cnf \
-p 3316:3306 \
--memory="1g" \
--cpus="1.5" \
--network mynet2 \
my-mysql-image:8.0

Deploy Tomcat With WAR

docker run -d \
--name tomcat3 \
--restart unless-stopped \
--health-cmd="curl -f http://localhost:8080 || exit 1" \
--health-interval=30s \
--health-retries=3 \
--health-timeout=5s \
-e TZ="Asia/Kolkata" \
-e JAVA_OPTS="-Xms128m -Xmx256m -Duser.timezone=Asia/Kolkata" \
-p 8083:8080 \
--network mynet2 \
-v /data/tomcat_docker/tomcat_common/warfile/log-api-1.0.war:/usr/local/tomcat/webapps/log-api-1.0.war \
-v /data/tomcat_docker/tomcat3/logs:/usr/local/tomcat/logs \
-v /data/tomcat_docker/tomcat_common/db_properties/db.properties:/usr/local/tomcat/conf/db.properties \
tomcat:9.0.111-jdk8-corretto-al2

Nginx Load Balancer Container

docker run -d \
--name nginxlb \
--restart unless-stopped \
--network mynet2 \
-p 80:80 \
-p 443:443 \
-v /data/nginx-lb/nginx.conf:/etc/nginx/nginx.conf:ro \
-v /data/nginx-lb/SSL:/etc/nginx/SSL:ro \
-v /data/nginx-lb/logs:/var/log/nginx \
--health-cmd="sh -c 'echo > /dev/tcp/127.0.0.1/80 || exit 1'" \
--health-interval=30s \
--health-retries=3 \
--health-timeout=5s \
-e TZ="Asia/Kolkata" \
nginx:stable

Test request

curl -k "https://localhost:443/log-api-1.0/log?msgtext=thisistest4011116&status=ok"

Docker Compose Basics

Start containers

docker compose up -d

Stop containers

docker compose down

Rebuild containers

docker compose up -d --build

Force recreate

docker compose up -d --force-recreate

Kubernetes Offline Install

by abhilashthale - March 15, 2026 coding

==================== OFFLINE KUBERNETES INSTALL (RHEL 8.10) ====================

CLUSTER DETAILS
---------------
MASTER  : 192.168.241.160
WORKERS : 192.168.241.161 , 192.168.241.162
K8S     : v1.30.14
RUNTIME : containerd
CNI     : flannel
ARTIFACTS PATH : /data
packages in /data: conntrack-tools-1.4.4-11.el8.x86_64, containerd.io-1.6.32-3.1.el8.x86_64, cri-tools-1.30.1-150500.1.1.x86_64, ethtool-5.13-2.el8.x86_64, iproute-6.2.0-6.el8_10.x86_64, iproute-tc-6.2.0-6.el8_10.x86_64, iptables-1.8.5-11.el8_9.x86_64, iptables-ebtables-1.8.5-11.el8_9.x86_64, kubeadm-1.30.14-150500.1.1.x86_64, kubectl-1.30.14-150500.1.1.x86_64, kubelet-1.30.14-150500.1.1.x86_64, kubernetes-cni-1.4.0-150500.1.1.x86_64, socat-1.7.4.1-2.el8_10.x86_64, createrepo, bash-auoconnect

================================================================================
STEP 0 : COMMON SETUP (RUN ON ALL NODES)
================================================================================

swapoff -a
sed -i '/swap/d' /etc/fstab

cat <<EOF >/etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

modprobe overlay
modprobe br_netfilter

cat <<EOF >/etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.ipv4.ip_forward=1
EOF

sysctl --system

================================================================================
STEP 1 : CONFIGURE OFFLINE REPO (RUN ON ALL NODES)
================================================================================

cat <<EOF >/etc/yum.repos.d/k8s-offline.repo
[k8s-offline]
name=Kubernetes Offline Repo
baseurl=file:///data/k8s-rpms
enabled=1
gpgcheck=0
EOF

dnf clean all

================================================================================
STEP 2 : INSTALL PACKAGES (RUN ON ALL NODES)
================================================================================

dnf install -y \
containerd.io \
kubeadm kubelet kubectl cri-tools kubernetes-cni \
conntrack-tools iproute iproute-tc iptables iptables-ebtables ethtool socat

systemctl enable --now containerd
systemctl enable kubelet

================================================================================
STEP 3 : CONFIGURE CONTAINERD (RUN ON ALL NODES)
================================================================================

containerd config default > /etc/containerd/config.toml

sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' \
/etc/containerd/config.toml

systemctl restart containerd
systemctl status containerd

================================================================================
STEP 4 : IMPORT IMAGES (OFFLINE)
================================================================================

# MASTER ONLY
ctr -n k8s.io images import /data/offline/k8s-images.tar
ctr -n k8s.io images import /data/offline/flannel.tar

# WORKERS ONLY
ctr -n k8s.io images import /data/offline/k8s-images.tar

================================================================================
STEP 5 : INITIALIZE CLUSTER (MASTER ONLY)
================================================================================

kubeadm init \
--apiserver-advertise-address=192.168.241.160 \
--pod-network-cidr=10.244.0.0/16

================================================================================
STEP 6 : CONFIGURE kubectl (MASTER ONLY)
================================================================================

mkdir -p $HOME/.kube
cp /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config

================================================================================
STEP 7 : INSTALL FLANNEL (MASTER ONLY, OFFLINE)
================================================================================

export KUBECONFIG=/etc/kubernetes/admin.conf
kubectl apply -f /data/offline/kube-flannel.yml

kubectl get pods -n kube-system

================================================================================
STEP 8 : JOIN WORKER NODES
================================================================================

# ON MASTER
kubeadm token create --print-join-command


# RUN OUTPUT COMMAND ON EACH WORKER
kubeadm reset -f

rm -rf /etc/cni/net.d
rm -rf /var/lib/cni
rm -rf /var/lib/kubelet/*

systemctl restart containerd
systemctl restart kubelet

kubeadm join 192.168.241.160:6443 \
--token <TOKEN> \
--discovery-token-ca-cert-hash sha256:<HASH>

================================================================================
STEP 9 : VERIFY CLUSTER (MASTER ONLY)
================================================================================

kubectl get nodes -o wide

EXPECTED OUTPUT
---------------
control   Ready   control-plane
node1     Ready
node2     Ready

==================== OFFLINE KUBERNETES INSTALL COMPLETE =======================

####################### KUBERNETES ETCD FULL LAB (BACKUP + BREAK + RESTORE) #######################

################################ STEP 1: CHECK CLUSTER ############################################

kubectl get nodes
kubectl get pods -A

##############################################################################################

################################ STEP 2: TAKE BACKUP (NO etcdctl ON HOST) ####################

exec -n kube-system etcd-controlnode -- etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=https://127.0.0.1:2379 snapshot save /var/lib/etcd-backup3.db

##############################################################################################

################################ STEP 3: BREAK CLUSTER ########################################

kubectl delete deployment webserver-deployment
kubectl delete pod samplepod

kubectl get all

##############################################################################################

################################ STEP 4: STOP ETCD ############################################

mv /etc/kubernetes/manifests/etcd.yaml /tmp/

# Verify

crictl ps | grep etcd

##############################################################################################

################################ STEP 5: RESTORE SNAPSHOT #####################################

ctr -n k8s.io run --rm -t 
--mount type=bind,src=/etc/kubernetes/pki/etcd,dst=/etc/kubernetes/pki/etcd,options=rbind:rw 
--mount type=bind,src=/var/lib,dst=/var/lib,options=rbind:rw 
registry.k8s.io/etcd:3.5.16-0 etcd-restore sh

# Inside container:

ETCDCTL_API=3 etcdctl snapshot restore /var/lib/etcd/etcd-backup.db 
--data-dir=/var/lib/etcd-restore

# VERIFY (VERY IMPORTANT)

echo /var/lib/*

exit

##############################################################################################

################################ STEP 6: UPDATE ETCD MANIFEST #################################

vi /etc/kubernetes/manifests/etcd.yaml

# CHANGE THESE THREE PLACES:

--data-dir=/var/lib/etcd-restore

# volumeMounts:

* mountPath: /var/lib/etcd-restore

# volumes:

path: /var/lib/etcd-restore

##############################################################################################

################################ STEP 7: START ETCD ###########################################

mv /tmp/etcd.yaml /etc/kubernetes/manifests/

# Wait 20–30 seconds

##############################################################################################

################################ STEP 8: FIX AUTH (VERY IMPORTANT) ############################

mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/
sleep 5
mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/

mv /etc/kubernetes/manifests/kube-controller-manager.yaml /tmp/
sleep 5
mv /tmp/kube-controller-manager.yaml /etc/kubernetes/manifests/

mv /etc/kubernetes/manifests/kube-scheduler.yaml /tmp/
sleep 5
mv /tmp/kube-scheduler.yaml /etc/kubernetes/manifests/

##############################################################################################

################################ STEP 9: (OPTIONAL) BYPASS AUTH ###############################

vi /etc/kubernetes/manifests/kube-apiserver.yaml

# Add:

--authorization-mode=AlwaysAllow

##############################################################################################

################################ STEP 10: VERIFY ##############################################

crictl ps

ps -ef | grep etcd

kubectl get nodes
kubectl get pods -A

##############################################################################################

################################ FINAL MEMORY #################################################

Backup → Break → Stop → Restore → Fix Mount → Start → Restart Control Plane → Verify

##############################################################################################
    

Issue Summary

calico-kube-controllers pod was stuck in CrashLoopBackOff and pods could not communicate with the Kubernetes API server via ClusterIP 10.96.0.1:443.


Environment

Component Detail
OS RHEL 9.6
Kubernetes v1.29.15
CNI Calico v3.27.0
Container Runtime containerd 2.2.2
Node IPs 192.168.241.140/141/142
Pod CIDR (configured) 192.168.0.0/16
Service CIDR 10.96.0.0/12

Timeline of Symptoms

  1. calico-kube-controllers pod stuck in CrashLoopBackOff
  2. Error: dial tcp 10.96.0.1:443: i/o timeout
  3. Test busybox pod on workernode1 also could not reach 10.96.0.1
  4. However, all 3 nodes could reach 10.96.0.1:443 directly from the host (got 403 Forbidden — meaning host-level connectivity was fine)

Investigation Steps

Step 1 — Ruled out common suspects

  • Firewalld — disabled on all nodes ✅
  • SELinux — disabled on all nodes ✅
  • ip_forward — enabled (net.ipv4.ip_forward = 1) on all nodes ✅
  • kube-proxy — running on all nodes ✅
  • iptables KUBE-SERVICES chain — rules for 10.96.0.1 existed on all nodes ✅
  • Nodes reaching 10.96.0.1:443 directly — working on all nodes ✅

Step 2 — Identified pod-specific failure

Host-to-ClusterIP worked but pod-to-ClusterIP timed out. This pointed to a problem specifically with how pod traffic was being NAT'd through the ClusterIP rules.

Step 3 — Found the smoking gun

Running this command on workernode1:

 

 

bash

iptables -t nat -L KUBE-SVC-NPX46M4PTMTKRN6Y -v -n
```

Revealed this rule:
```
KUBE-MARK-MASQ  tcp  --  *  *  !192.168.0.0/16  10.96.0.1  tcp dpt:443
```

The `!192.168.0.0/16` means — **only masquerade (SNAT) traffic coming from OUTSIDE 192.168.0.0/16**. Traffic from inside that range is excluded from masquerading.

---

## Root Cause

**Pod CIDR `192.168.0.0/16` overlapped with Node IP range `192.168.241.x`.**

This caused a chain reaction:
```
Pod IP: 192.168.212.4
        ↓
Sends packet to 10.96.0.1:443
        ↓
kube-proxy KUBE-SERVICES chain matches → forwards to KUBE-SVC-NPX46M4PTMTKRN6Y
        ↓
KUBE-MARK-MASQ rule checks source IP:
192.168.212.4 is INSIDE 192.168.0.0/16
        ↓
MASQUERADE is SKIPPED ← problem here
        ↓
Packet reaches API server (192.168.241.140:6443)
with source IP 192.168.212.4 (pod IP)
        ↓
API server tries to reply to 192.168.212.4
but has no route back to that pod IP
        ↓
Connection times out

kube-proxy intentionally excludes pod CIDR from masquerading to avoid unnecessary NAT for pod-to-pod traffic. But when the pod CIDR overlaps with the node network, this optimization breaks pod-to-ClusterIP communication.


Why Nodes Could Reach 10.96.0.1 But Pods Could Not

Source Source IP In 192.168.0.0/16? Masqueraded? Works?
Node (host) 192.168.241.x Yes No ✅ Yes — node IP is routable
Pod 192.168.212.4 Yes No ❌ No — pod IP not directly routable to API server

Nodes have real routable IPs so replies come back fine even without masquerading. Pods do not — they need SNAT so the reply goes back to the node, which then forwards it to the pod.


Temporary Fix Applied

Set masqueradeAll: true in kube-proxy configmap:

 

 

yaml

iptables:
  masqueradeAll: true

This forces SNAT on all pod-to-ClusterIP traffic regardless of source IP, bypassing the overlap problem. This worked but adds NAT overhead on every pod connection.


Permanent Fix — Reinstall with Non-Overlapping CIDRs

Network Old (broken) New (correct)
Pod CIDR 192.168.0.0/16 172.16.0.0/16
Service CIDR 10.96.0.0/12 10.96.0.0/12
Node IPs 192.168.241.x 192.168.241.x

Reinstall command:

 

 

bash

kubeadm init \
  --pod-network-cidr=172.16.0.0/16 \
  --service-cidr=10.96.0.0/12 \
  --apiserver-advertise-address=192.168.241.140

With Calico configured to match:

 

 

yaml

- name: CALICO_IPV4POOL_CIDR
  value: "172.16.0.0/16"

Key Lessons Learned

  1. Always ensure pod CIDR, service CIDR, and node IP ranges are non-overlapping before installing Kubernetes. This is the most common mistake in home lab setups.
  2. RHEL 9 uses nf_tables backend for iptables — be aware that some manual iptables commands and older kube-proxy behaviors may not work as expected. Plan for this when setting up Kubernetes on RHEL 9.
  3. Host-to-ClusterIP working does not mean pod-to-ClusterIP works — always test connectivity from inside a pod, not just from the node.
  4. Calico's default 192.168.0.0/16 pod CIDR is just a default, not a requirement — it can and should be changed if your node network uses the same range.

===== SSL FULL SETUP (CA + SERVER CERT + VERIFY + TEST) =====

1. Generate CA key

openssl genrsa -out abhilash-ca.key 3072

2. Generate CA certificate

openssl req -x509 -new -nodes
-key abhilash-ca.key
-sha256 -days 3650
-out abhilash-ca.crt
-subj "/C=IN/ST=Maharashtra/L=Mumbai/O=AbhilashOrg/CN=Abhilash-Root-CA"

3. Generate server key

openssl genrsa -out server.key 3072

4. Create SAN config

cat < san.cnf
[req]
distinguished_name = dn
req_extensions = req_ext
prompt = no

[dn]
C = IN
ST = Maharashtra
L = Mumbai
O = AbhilashOrg
CN = nginx.local

[req_ext]
subjectAltName = @alt_names

[alt_names]
DNS.1 = nginx.local
DNS.2 = controlnode
IP.1 = 127.0.0.1
IP.2 = 192.168.240.140
EOF

5. Generate CSR

openssl req -new
-key server.key
-out server.csr
-config san.cnf

6. Sign certificate with CA

openssl x509 -req
-in server.csr
-CA abhilash-ca.crt
-CAkey abhilash-ca.key
-CAcreateserial
-out server.crt
-days 825
-sha256
-extensions req_ext
-extfile san.cnf

7. Verify certificate with CA

openssl verify -CAfile abhilash-ca.crt server.crt

8. Check SAN

openssl x509 -in server.crt -text -noout | grep -A1 "Subject Alternative Name"

9. Match key and cert (hash must match)

openssl x509 -noout -modulus -in server.crt | openssl md5
openssl rsa -noout -modulus -in server.key | openssl md5

10. Test SSL locally

openssl s_server -key server.key -cert server.crt -accept 8443

(Run below in another terminal)

openssl s_client -connect localhost:8443

11. Create Kubernetes TLS secret

kubectl create secret tls nginx-tls
--cert=server.crt
--key=server.key

===== DONE =====