1) Buy a VPS Server
for setting up django web app in centos/Rhel vps system you have to buy vps server from vps provider it will give you server ip address and root password also you will get client web gui for management of server from that you get access to reset server ,reinstall server or boot sever into safe mode and from there you can even change the root password
2) ssh the server
now ssh the sever from any ssh client like putty ,cmd or mobxtrem,etc.
ssh root@192.168.47.101 ---> give root password of server
after that you will get the full access that server
3)Change the root password
once you login to ssh server you should change your root password by passwd commnad
passwd root --> give new password two times
4) add a new user
add a new user so you dont needed to use root for all the times it will make your server more secure
useradd test
passwd test --> give new password two times
now add that test user to wheel group so you can use sudo command by sudo command you can access all system command by that new user
usermod -G wheel test
5) Configure ssh
configure ssh so no one can login to the system via root , edit ssh configuration file and write PermitRootLogin to No
vim /etc/ssh/sshd_config
PermitRootLogin to No
now reboot the system and login via ssh by ssh user@server
6)update the system
now update the system then reboot the system
e.g yum update && reboot
7)Install python3
after sucessfully update install python3
yum install python3
8)install and activate virtual enviroment
after installation of python3 we can use python pip package installer by using pip we can install and create virtual enviroment
by using virtual enviroment we can install python packages virtually that means they only available when we activate the virtual enviroment
pip3 install virtualenv
9)activate virtual enviroment
for activation of virtual enviroment create new directory use source command to activate it
mkdir /djangoenv
virtualenv /djangoenv
source /django/bin/activate
10)install python packages
first create the requirement.txt ,in that file write list of names of python packages
now install it by
pip install -r requirements.txt
11) install sqlite3.8
for django to work in python3 in centos it need the sqlite version 3.8
wget https://kojipkgs.fedoraproject.org//packages/sqlite/3.8.11/1.fc21/x86_64/sqlite-3.8.11-1.fc21.x86_64.rpm
sudo yum install sqlite-3.8.11-1.fc21.x86_64.rpm
12)copy django application to server
copy django application to server by using scp or sftp
scp djangoapplication.zip user@server
13)test the django application
python3 manage.py runserver
14)install apache webserver
install apache webserver
yum install httpd*
15) configure apache server
go to /etc/httpd/conf.d
create web.conf file
type configuration
<virtual host *:80>
servername yoursitename
alias /static /your static directory in django application
alias /media /your media directory in django application
<directory /your static direcory >
require all granted
</directory>
<directory /your media directory >
require all granted
</directory>
<directory /your main app directory where wsgi file located>
<files wsgi.py>
require all granted
</files>
</directory>
WSGIDaemonProcess name_of_your_project python-path = /directory of project
python-home = /directory of enviroment variables'
WSGIProcessGroup name_of_project
WSGIScriptAlias / /your django app directory
</virtualhost>
save and exit
if there is numpy python package in your django app then it will gives time out so add following line to
/etc/httpd/conf/httpd.conf
WSGIApplicationGroup % {GLOBAL}
save and exit
16)Restart apache server
test apache config by
apachectl configtest
restart apache server
systemctl restart httpd
17)go to web browser check your site
Done.
Create Replication Mysql
mysqld --defaults-file=/etc/my.cnf --initialize --user=mysql
In 8.4 Community, the host cache is managed internally.
FLUSH HOST DEPRICATED --> SELECT * FROM performance_schema.host_cache;
CREATE USER 'repl'@'%' IDENTIFIED BY 'password';
GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
FLUSH PRIVILEGES;
CHANGE REPLICATION SOURCE TO
SOURCE_HOST='192.168.241.101',
SOURCE_USER='repl',
SOURCE_PASSWORD='password',
SOURCE_PORT=3360,
SOURCE_AUTO_POSITION=1,
SOURCE_SSL=1,
SOURCE_SSL_CA='/data/ca.pem';
slave my.cnf
[mysqld]
datadir=/data/mysql_server
socket=/var/lib/mysql/mysql.sock
log-error=/data/mysql_server/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
server-id=2
log_bin=mysql-bin
binlog_format=ROW
gtid_mode=ON
enforce_gtid_consistency=ON
get_source_public_key=1
port=3360
user=mysql
symbolic-links=0
# Connection limits (safe for low-memory VM)
max_connections=50
max_user_connections=50
# Packet and temporary table sizes
max_allowed_packet=16M
tmp_table_size=32M
max_heap_table_size=32M
# Sorting and read buffers (per connection, smaller for low RAM)
sort_buffer_size=2M
read_buffer_size=2M
read_rnd_buffer_size=4M
join_buffer_size=2M
# Storage engine
default-storage-engine=InnoDB
key_buffer_size=8M
bulk_insert_buffer_size=8M
# InnoDB settings for small memory
innodb_log_file_size=32M
innodb_print_all_deadlocks=1
innodb_buffer_pool_instances=1
innodb_buffer_pool_size=512M
innodb_read_io_threads=4
innodb_write_io_threads=4
innodb_thread_concurrency=0
innodb_io_capacity=100
innodb_log_buffer_size=8M
innodb_flush_log_at_trx_commit=2
innodb_lock_wait_timeout=50
# Transaction isolation
transaction-isolation=READ-COMMITTED
[root@node2 data]#
============================
master my.cnf
[mysqld]
datadir=/data/mysql_server
socket=/var/lib/mysql/mysql.sock
log-error=/data/mysql_server/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
server-id=1
log_bin=mysql-bin
binlog_format=ROW
gtid_mode=ON
enforce_gtid_consistency=ON
plugin-load-add=mysql_native_password.so
port=3360
user=mysql
symbolic-links=0
# Connection limits (safe for low-memory VM)
max_connections=50
max_user_connections=50
# Packet and temporary table sizes
max_allowed_packet=16M
tmp_table_size=32M
max_heap_table_size=32M
# Sorting and read buffers (per connection, smaller for low RAM)
sort_buffer_size=2M
read_buffer_size=2M
read_rnd_buffer_size=4M
join_buffer_size=2M
# Storage engine
default-storage-engine=InnoDB
key_buffer_size=8M
bulk_insert_buffer_size=8M
# InnoDB settings for small memory
innodb_log_file_size=32M
innodb_print_all_deadlocks=1
innodb_buffer_pool_instances=1
innodb_buffer_pool_size=512M
innodb_read_io_threads=4
innodb_write_io_threads=4
innodb_thread_concurrency=0
innodb_io_capacity=100
innodb_log_buffer_size=8M
innodb_flush_log_at_trx_commit=2
innodb_lock_wait_timeout=50
# Transaction isolation
transaction-isolation=READ-COMMITTED
Secure Monitoring Setup – Prometheus + Node Exporter + Grafana (HTTPS + TLS + Django Embed)
OS: Rocky Linux 8.10 / RHEL compatible
Container runtime: Podman
TLS: Let’s Encrypt or self-signed
Grafana HTTPS port: 42923
Goal:
– node exporter & prometheus not public
– only localhost access
– grafana SSL enabled on random port
– disable anonymous login
– embed inside Django site
– force login authentication
=======================================
Create directories
mkdir -p /opt/node
mkdir -p /opt/prometheus
mkdir -p /opt/grafana/certs
=======================================
2) Copy SSL certificates for Grafana
Use Let’s Encrypt certs (recommended)
cp -L /etc/letsencrypt/live/www.yourdomain.com/fullchain.pem /opt/grafana/certs/fullchain.pem
cp -L /etc/letsencrypt/live/www.yourdomain.com/privkey.pem /opt/grafana/certs/privkey.pem
Fix permissions
chmod 640 /opt/grafana/certs/privkey.pem
chmod 644 /opt/grafana/certs/fullchain.pem
SELinux label
chcon -Rt container_file_t /opt/grafana/certs
=======================================
3) Create Node Exporter HTTPS config file
cat > /opt/node/web.yml <<EOF
tls_server_config:
cert_file: /certs/server.crt
key_file: /certs/server.key
EOF
If using self-signed:
copy cert files
cp /opt/ssl/selfsigned/server.crt /opt/node/
cp /opt/ssl/selfsigned/server.key /opt/node/
=======================================
4) Run Node Exporter (localhost only + HTTPS)
podman run -d
--name node_exporter
-p 127.0.0.1:9100:9100
-v /opt/node/web.yml:/etc/node/web.yml:Z
-v /opt/ssl/selfsigned/server.crt:/certs/server.crt:ro,Z
-v /opt/ssl/selfsigned/server.key:/certs/server.key:ro,Z
quay.io/prometheus/node-exporter
--web.config.file=/etc/node/web.yml
--web.listen-address=127.0.0.1:9100
Test
curl -k https://127.0.0.1:9100/metrics
=======================================
5) Create Prometheus config
cat > /opt/prometheus/prometheus.yml <<EOF
global:
scrape_interval: 5s
scrape_configs:
job_name: 'node'
scheme: https
tls_config:
insecure_skip_verify: true
static_configs:
targets: ['127.0.0.1:9100']
EOF
=======================================
6) Run Prometheus container
podman run -d
--name prometheus
-p 127.0.0.1:9090:9090
-v /opt/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml:Z
prom/prometheus
Test local access
curl -k https://127.0.0.1:9090/api/v1/targets
You should see "health: up"
=======================================
7) Run Grafana HTTPS on port 42923
Remove any old container first
podman stop grafana || true
podman rm grafana || true
Run new
podman run -d
--name=grafana
--net=host
-v grafana:/var/lib/grafana
-v /opt/grafana/certs/fullchain.pem:/certs/fullchain.pem:ro,Z
-v /opt/grafana/certs/privkey.pem:/certs/privkey.pem:ro,Z
-e GF_SERVER_PROTOCOL=https
-e GF_SERVER_HTTP_PORT=42923
-e GF_SERVER_CERT_FILE=/certs/fullchain.pem
-e GF_SERVER_CERT_KEY=/certs/privkey.pem
-e GF_AUTH_ANONYMOUS_ENABLED=false
-e GF_USERS_ALLOW_SIGN_UP=false
-e GF_SECURITY_ALLOW_EMBEDDING=true
grafana/grafana:latest
Test
curl -k https://127.0.0.1:42923
Expected:
<a href="/login">Found</a>
Browser URL
https://www.yourdomain.com:42923
=======================================
8) First Grafana login
Default user:
admin / admin
Grafana will force you to CHANGE PASSWORD
Create user “xxxx” later in settings.
=======================================
9) Import dashboard ID 1860
In Grafana:
Dashboards → Import
Dashboard ID: 1860
Select Prometheus datasource
=======================================
10) Django embed setup
Create app: serverstats
views.py
from django.contrib.auth.decorators import login_required
from django.shortcuts import render
@login_required
def serverstats_home(request):
return render(request, "serverstats/home.html")
Template home.html
<h2>Server Monitoring Dashboard</h2> <iframe src="https://www.yourdomain.com:42923/d/xxxx?orgId=1&refresh=30s" width="100%" height="900" frameborder="0"> </iframe>
urls.py entry
path("serverstats/", serverstats_home, name="serverstats")
=======================================
11) Security notes
Node exporter is local only ✔
Prometheus is local only ✔
Grafana HTTPS enforced ✔
Random port 42923 ✔
Anonymous Grafana disabled ✔
Django auth required ✔
TLS everywhere ✔
Migrating Web Site From One Vps To Another
update soon
docker basics
sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
sudo yum install -y docker-ce docker-ce-cli containerd.io
sudo systemctl enable --now docker
Add user to docker group
sudo usermod -aG docker $USER
Test docker
docker run hello-world
Create file:
/etc/docker/daemon.json
Add configuration:
{
"ipv6": false,
"dns": ["8.8.8.8","1.1.1.1"]
}
Restart docker
systemctl restart docker
Verify
docker run hello-world
docker pull nginx
docker pull ubuntu
docker pull mysql
docker run -d --name mynginx -p 8080:80 nginx
Explanation
-d = run in background
--name mynginx = container name
-p 8080:80 = map container port 80 to host port 8080
docker exec -it mynginx bash
docker stop mynginx
docker start mynginx
docker restart mynginx
docker logs mynginx
docker logs -f mynginx
docker stop mynginx
docker rm mynginx
Remove all stopped containers
docker container prune
docker run -d --name mynginx -p 8081:80 nginx
Container port 80 → Host port 8081
Create configuration file
mkdir -p /data/docker
touch /data/docker/my.cnf
chown 999:999 /data/docker/my.cnf
chmod 644 /data/docker/my.cnf
Run MySQL container
docker run -d \
--name mydb \
-e MYSQL_ROOT_PASSWORD=pass123 \
-v /data/mysql:/var/lib/mysql \
-v /data/docker/my.cnf:/etc/mysql/conf.d/my.cnf \
-p 3306:3306 \
mysql:latest
Create Dockerfile
cat > Dockerfile <<EOF
FROM mysql:8.0
LABEL maintainer="you@example.com"
COPY my.cnf /etc/mysql/conf.d/my.cnf
RUN apt-get update && apt-get install -y \
vim \
net-tools \
&& apt-get clean
EOF
docker images # list images
docker ps # list containers
docker stats # container resource usage
docker logs -f name # view logs
docker inspect name # container config
docker exec -it name bash # enter container
docker top name # processes inside container
docker system df # disk usage
docker run -d \
--name mydb5 \
-e MYSQL_ROOT_PASSWORD=pass123 \
-p 3308:3306 \
--memory="1g" \
--cpus="1.5" \
mysql:8.0
Verify limits
docker inspect mydb5 | grep -i -E "memory|cpus"
Live resource usage
docker stats mydb5
Check inside container
docker exec -it mydb5 cat /sys/fs/cgroup/memory/memory.limit_in_bytes
docker exec -it mydb5 cat /sys/fs/cgroup/cpu/cpu.cfs_quota_us
docker exec -it mydb5 cat /sys/fs/cgroup/cpu/cpu.cfs_period_us
docker run -d \
--name mydb10 \
--restart=always \
-e MYSQL_ROOT_PASSWORD=pass123 \
-p 3310:3306 \
mysql:latest
Restart options
no
on-failure
always
unless-stopped
Default network: bridge
Create network
docker network create mynet2
docker network inspect mynet2
Run container on network
docker run -d \
--name mydb12 \
--restart=always \
-e MYSQL_ROOT_PASSWORD=pass123 \
-p 3312:3306 \
--network mynet2 \
mysql:latest
Connect container to multiple networks
docker network connect mynet mydb9
docker network connect mynet1 mydb9
docker network connect mynet2 mydb9
docker network connect mynet3 mydb9
Network types
bridge : containers on same host
host : use host network
none : isolated container
overlay : multi-host cluster networking
Example
docker run -d \
--name web2 \
--restart unless-stopped \
--health-cmd="curl -f http://localhost:80 || exit 1" \
--health-interval=30s \
--health-retries=3 \
--health-timeout=5s \
nginx
Exit codes
0 = success
1 = failure
command missing = failure
docker cp temp-tomcat:/usr/local/tomcat/conf /data/mytomcat
docker run -d \
--name mydb16v8 \
-e MYSQL_ROOT_PASSWORD=pass123 \
-e TZ="Asia/Kolkata" \
--restart unless-stopped \
--health-cmd="mysqladmin ping -h localhost -u root --password=pass123 || exit 1" \
--health-interval=5s \
--health-retries=5 \
-v /data/mysql16v8:/var/lib/mysql \
-v /data/docker/my.cnf:/etc/mysql/conf.d/my.cnf \
-p 3316:3306 \
--memory="1g" \
--cpus="1.5" \
--network mynet2 \
my-mysql-image:8.0
docker run -d \
--name tomcat3 \
--restart unless-stopped \
--health-cmd="curl -f http://localhost:8080 || exit 1" \
--health-interval=30s \
--health-retries=3 \
--health-timeout=5s \
-e TZ="Asia/Kolkata" \
-e JAVA_OPTS="-Xms128m -Xmx256m -Duser.timezone=Asia/Kolkata" \
-p 8083:8080 \
--network mynet2 \
-v /data/tomcat_docker/tomcat_common/warfile/log-api-1.0.war:/usr/local/tomcat/webapps/log-api-1.0.war \
-v /data/tomcat_docker/tomcat3/logs:/usr/local/tomcat/logs \
-v /data/tomcat_docker/tomcat_common/db_properties/db.properties:/usr/local/tomcat/conf/db.properties \
tomcat:9.0.111-jdk8-corretto-al2
docker run -d \
--name nginxlb \
--restart unless-stopped \
--network mynet2 \
-p 80:80 \
-p 443:443 \
-v /data/nginx-lb/nginx.conf:/etc/nginx/nginx.conf:ro \
-v /data/nginx-lb/SSL:/etc/nginx/SSL:ro \
-v /data/nginx-lb/logs:/var/log/nginx \
--health-cmd="sh -c 'echo > /dev/tcp/127.0.0.1/80 || exit 1'" \
--health-interval=30s \
--health-retries=3 \
--health-timeout=5s \
-e TZ="Asia/Kolkata" \
nginx:stable
Test request
curl -k "https://localhost:443/log-api-1.0/log?msgtext=thisistest4011116&status=ok"
Start containers
docker compose up -d
Stop containers
docker compose down
Rebuild containers
docker compose up -d --build
Force recreate
docker compose up -d --force-recreate
==================== OFFLINE KUBERNETES INSTALL (RHEL 8.10) ====================
CLUSTER DETAILS
---------------
MASTER : 192.168.241.160
WORKERS : 192.168.241.161 , 192.168.241.162
K8S : v1.30.14
RUNTIME : containerd
CNI : flannel
ARTIFACTS PATH : /data
packages in /data: conntrack-tools-1.4.4-11.el8.x86_64, containerd.io-1.6.32-3.1.el8.x86_64, cri-tools-1.30.1-150500.1.1.x86_64, ethtool-5.13-2.el8.x86_64, iproute-6.2.0-6.el8_10.x86_64, iproute-tc-6.2.0-6.el8_10.x86_64, iptables-1.8.5-11.el8_9.x86_64, iptables-ebtables-1.8.5-11.el8_9.x86_64, kubeadm-1.30.14-150500.1.1.x86_64, kubectl-1.30.14-150500.1.1.x86_64, kubelet-1.30.14-150500.1.1.x86_64, kubernetes-cni-1.4.0-150500.1.1.x86_64, socat-1.7.4.1-2.el8_10.x86_64, createrepo, bash-auoconnect
================================================================================
STEP 0 : COMMON SETUP (RUN ON ALL NODES)
================================================================================
swapoff -a
sed -i '/swap/d' /etc/fstab
cat <<EOF >/etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF
modprobe overlay
modprobe br_netfilter
cat <<EOF >/etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.ipv4.ip_forward=1
EOF
sysctl --system
================================================================================
STEP 1 : CONFIGURE OFFLINE REPO (RUN ON ALL NODES)
================================================================================
cat <<EOF >/etc/yum.repos.d/k8s-offline.repo
[k8s-offline]
name=Kubernetes Offline Repo
baseurl=file:///data/k8s-rpms
enabled=1
gpgcheck=0
EOF
dnf clean all
================================================================================
STEP 2 : INSTALL PACKAGES (RUN ON ALL NODES)
================================================================================
dnf install -y \
containerd.io \
kubeadm kubelet kubectl cri-tools kubernetes-cni \
conntrack-tools iproute iproute-tc iptables iptables-ebtables ethtool socat
systemctl enable --now containerd
systemctl enable kubelet
================================================================================
STEP 3 : CONFIGURE CONTAINERD (RUN ON ALL NODES)
================================================================================
containerd config default > /etc/containerd/config.toml
sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' \
/etc/containerd/config.toml
systemctl restart containerd
systemctl status containerd
================================================================================
STEP 4 : IMPORT IMAGES (OFFLINE)
================================================================================
# MASTER ONLY
ctr -n k8s.io images import /data/offline/k8s-images.tar
ctr -n k8s.io images import /data/offline/flannel.tar
# WORKERS ONLY
ctr -n k8s.io images import /data/offline/k8s-images.tar
================================================================================
STEP 5 : INITIALIZE CLUSTER (MASTER ONLY)
================================================================================
kubeadm init \
--apiserver-advertise-address=192.168.241.160 \
--pod-network-cidr=10.244.0.0/16
================================================================================
STEP 6 : CONFIGURE kubectl (MASTER ONLY)
================================================================================
mkdir -p $HOME/.kube
cp /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
================================================================================
STEP 7 : INSTALL FLANNEL (MASTER ONLY, OFFLINE)
================================================================================
export KUBECONFIG=/etc/kubernetes/admin.conf
kubectl apply -f /data/offline/kube-flannel.yml
kubectl get pods -n kube-system
================================================================================
STEP 8 : JOIN WORKER NODES
================================================================================
# ON MASTER
kubeadm token create --print-join-command
# RUN OUTPUT COMMAND ON EACH WORKER
kubeadm reset -f
rm -rf /etc/cni/net.d
rm -rf /var/lib/cni
rm -rf /var/lib/kubelet/*
systemctl restart containerd
systemctl restart kubelet
kubeadm join 192.168.241.160:6443 \
--token <TOKEN> \
--discovery-token-ca-cert-hash sha256:<HASH>
================================================================================
STEP 9 : VERIFY CLUSTER (MASTER ONLY)
================================================================================
kubectl get nodes -o wide
EXPECTED OUTPUT
---------------
control Ready control-plane
node1 Ready
node2 Ready
==================== OFFLINE KUBERNETES INSTALL COMPLETE =======================
####################### KUBERNETES ETCD FULL LAB (BACKUP + BREAK + RESTORE) #######################
################################ STEP 1: CHECK CLUSTER ############################################
kubectl get nodes
kubectl get pods -A
##############################################################################################
################################ STEP 2: TAKE BACKUP (NO etcdctl ON HOST) ####################
exec -n kube-system etcd-controlnode -- etcdctl --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key --endpoints=https://127.0.0.1:2379 snapshot save /var/lib/etcd-backup3.db
##############################################################################################
################################ STEP 3: BREAK CLUSTER ########################################
kubectl delete deployment webserver-deployment
kubectl delete pod samplepod
kubectl get all
##############################################################################################
################################ STEP 4: STOP ETCD ############################################
mv /etc/kubernetes/manifests/etcd.yaml /tmp/
# Verify
crictl ps | grep etcd
##############################################################################################
################################ STEP 5: RESTORE SNAPSHOT #####################################
ctr -n k8s.io run --rm -t
--mount type=bind,src=/etc/kubernetes/pki/etcd,dst=/etc/kubernetes/pki/etcd,options=rbind:rw
--mount type=bind,src=/var/lib,dst=/var/lib,options=rbind:rw
registry.k8s.io/etcd:3.5.16-0 etcd-restore sh
# Inside container:
ETCDCTL_API=3 etcdctl snapshot restore /var/lib/etcd/etcd-backup.db
--data-dir=/var/lib/etcd-restore
# VERIFY (VERY IMPORTANT)
echo /var/lib/*
exit
##############################################################################################
################################ STEP 6: UPDATE ETCD MANIFEST #################################
vi /etc/kubernetes/manifests/etcd.yaml
# CHANGE THESE THREE PLACES:
--data-dir=/var/lib/etcd-restore
# volumeMounts:
* mountPath: /var/lib/etcd-restore
# volumes:
path: /var/lib/etcd-restore
##############################################################################################
################################ STEP 7: START ETCD ###########################################
mv /tmp/etcd.yaml /etc/kubernetes/manifests/
# Wait 20–30 seconds
##############################################################################################
################################ STEP 8: FIX AUTH (VERY IMPORTANT) ############################
mv /etc/kubernetes/manifests/kube-apiserver.yaml /tmp/
sleep 5
mv /tmp/kube-apiserver.yaml /etc/kubernetes/manifests/
mv /etc/kubernetes/manifests/kube-controller-manager.yaml /tmp/
sleep 5
mv /tmp/kube-controller-manager.yaml /etc/kubernetes/manifests/
mv /etc/kubernetes/manifests/kube-scheduler.yaml /tmp/
sleep 5
mv /tmp/kube-scheduler.yaml /etc/kubernetes/manifests/
##############################################################################################
################################ STEP 9: (OPTIONAL) BYPASS AUTH ###############################
vi /etc/kubernetes/manifests/kube-apiserver.yaml
# Add:
--authorization-mode=AlwaysAllow
##############################################################################################
################################ STEP 10: VERIFY ##############################################
crictl ps
ps -ef | grep etcd
kubectl get nodes
kubectl get pods -A
##############################################################################################
################################ FINAL MEMORY #################################################
Backup → Break → Stop → Restore → Fix Mount → Start → Restart Control Plane → Verify
##############################################################################################
calico-kube-controllers pod was stuck in CrashLoopBackOff and pods could not communicate with the Kubernetes API server via ClusterIP 10.96.0.1:443.
| Component | Detail |
|---|---|
| OS | RHEL 9.6 |
| Kubernetes | v1.29.15 |
| CNI | Calico v3.27.0 |
| Container Runtime | containerd 2.2.2 |
| Node IPs | 192.168.241.140/141/142 |
| Pod CIDR (configured) | 192.168.0.0/16 |
| Service CIDR | 10.96.0.0/12 |
calico-kube-controllers pod stuck in CrashLoopBackOffdial tcp 10.96.0.1:443: i/o timeout10.96.0.110.96.0.1:443 directly from the host (got 403 Forbidden — meaning host-level connectivity was fine)net.ipv4.ip_forward = 1) on all nodes ✅10.96.0.1 existed on all nodes ✅10.96.0.1:443 directly — working on all nodes ✅Host-to-ClusterIP worked but pod-to-ClusterIP timed out. This pointed to a problem specifically with how pod traffic was being NAT'd through the ClusterIP rules.
Running this command on workernode1:
bash
iptables -t nat -L KUBE-SVC-NPX46M4PTMTKRN6Y -v -n
```
Revealed this rule:
```
KUBE-MARK-MASQ tcp -- * * !192.168.0.0/16 10.96.0.1 tcp dpt:443
```
The `!192.168.0.0/16` means — **only masquerade (SNAT) traffic coming from OUTSIDE 192.168.0.0/16**. Traffic from inside that range is excluded from masquerading.
---
## Root Cause
**Pod CIDR `192.168.0.0/16` overlapped with Node IP range `192.168.241.x`.**
This caused a chain reaction:
```
Pod IP: 192.168.212.4
↓
Sends packet to 10.96.0.1:443
↓
kube-proxy KUBE-SERVICES chain matches → forwards to KUBE-SVC-NPX46M4PTMTKRN6Y
↓
KUBE-MARK-MASQ rule checks source IP:
192.168.212.4 is INSIDE 192.168.0.0/16
↓
MASQUERADE is SKIPPED ← problem here
↓
Packet reaches API server (192.168.241.140:6443)
with source IP 192.168.212.4 (pod IP)
↓
API server tries to reply to 192.168.212.4
but has no route back to that pod IP
↓
Connection times out
kube-proxy intentionally excludes pod CIDR from masquerading to avoid unnecessary NAT for pod-to-pod traffic. But when the pod CIDR overlaps with the node network, this optimization breaks pod-to-ClusterIP communication.
| Source | Source IP | In 192.168.0.0/16? | Masqueraded? | Works? |
|---|---|---|---|---|
| Node (host) | 192.168.241.x | Yes | No | ✅ Yes — node IP is routable |
| Pod | 192.168.212.4 | Yes | No | ❌ No — pod IP not directly routable to API server |
Nodes have real routable IPs so replies come back fine even without masquerading. Pods do not — they need SNAT so the reply goes back to the node, which then forwards it to the pod.
Set masqueradeAll: true in kube-proxy configmap:
yaml
iptables:
masqueradeAll: true
This forces SNAT on all pod-to-ClusterIP traffic regardless of source IP, bypassing the overlap problem. This worked but adds NAT overhead on every pod connection.
| Network | Old (broken) | New (correct) |
|---|---|---|
| Pod CIDR | 192.168.0.0/16 |
172.16.0.0/16 |
| Service CIDR | 10.96.0.0/12 |
10.96.0.0/12 |
| Node IPs | 192.168.241.x |
192.168.241.x |
Reinstall command:
bash
kubeadm init \
--pod-network-cidr=172.16.0.0/16 \
--service-cidr=10.96.0.0/12 \
--apiserver-advertise-address=192.168.241.140
With Calico configured to match:
yaml
- name: CALICO_IPV4POOL_CIDR
value: "172.16.0.0/16"
192.168.0.0/16 pod CIDR is just a default, not a requirement — it can and should be changed if your node network uses the same range.Ssl Full Setup (Ca + Server Cert + Verify + Test)
openssl genrsa -out abhilash-ca.key 3072
openssl req -x509 -new -nodes
-key abhilash-ca.key
-sha256 -days 3650
-out abhilash-ca.crt
-subj "/C=IN/ST=Maharashtra/L=Mumbai/O=AbhilashOrg/CN=Abhilash-Root-CA"
openssl genrsa -out server.key 3072
cat < san.cnf
[req]
distinguished_name = dn
req_extensions = req_ext
prompt = no
[dn]
C = IN
ST = Maharashtra
L = Mumbai
O = AbhilashOrg
CN = nginx.local
[req_ext]
subjectAltName = @alt_names
[alt_names]
DNS.1 = nginx.local
DNS.2 = controlnode
IP.1 = 127.0.0.1
IP.2 = 192.168.240.140
EOF
openssl req -new
-key server.key
-out server.csr
-config san.cnf
openssl x509 -req
-in server.csr
-CA abhilash-ca.crt
-CAkey abhilash-ca.key
-CAcreateserial
-out server.crt
-days 825
-sha256
-extensions req_ext
-extfile san.cnf
openssl verify -CAfile abhilash-ca.crt server.crt
openssl x509 -in server.crt -text -noout | grep -A1 "Subject Alternative Name"
openssl x509 -noout -modulus -in server.crt | openssl md5
openssl rsa -noout -modulus -in server.key | openssl md5
openssl s_server -key server.key -cert server.crt -accept 8443
kubectl create secret tls nginx-tls
--cert=server.crt
--key=server.key