Troubleshooting Guide
This guide helps you diagnose and resolve common issues with KECS.
Diagnostic Tools
Health Checks
Check KECS component health:
# API server health
curl http://localhost:8081/health
# Detailed health status
curl http://localhost:8081/health/detailed
# Kubernetes connectivity
kubectl cluster-infoLogs
View KECS logs:
# If running directly
kecs server 2>&1 | tee kecs.log
# If running in Docker
docker logs kecs-container
# If running in Kubernetes
kubectl logs -n kecs-system deployment/kecs-control-planeDebug Mode
Enable debug logging:
# Via command line
kecs server --log-level debug
# Via environment variable
export KECS_LOG_LEVEL=debug
kecs serverCommon Issues
Installation Issues
Problem: Build Fails
Symptoms:
go: cannot find main moduleSolution:
# Ensure you're in the correct directory
cd /path/to/kecs
# Clean and rebuild
make clean
make deps
make buildProblem: Missing Dependencies
Symptoms:
package github.com/... is not in GOROOTSolution:
# Update dependencies
go mod download
go mod tidy
# Verify Go version
go version # Should be 1.21+Startup Issues
Problem: Port Already in Use
Symptoms:
listen tcp :8080: bind: address already in useSolution:
# Find process using port
lsof -i :8080
# Kill process
kill -9 <PID>
# Or use different port
kecs server --api-port 9080Problem: Cannot Connect to Kubernetes
Symptoms:
failed to get kubernetes config: stat /home/user/.kube/config: no such file or directorySolution:
# Check kubeconfig exists
ls ~/.kube/config
# Set kubeconfig explicitly
kecs server --kubeconfig /path/to/kubeconfig
# Or use in-cluster config
kubectl apply -f deploy/kubernetes/rbac.yamlCluster Operations
Problem: Cluster Creation Fails
Symptoms:
failed to create kind cluster: exit status 1Solution:
# Check Docker is running
docker ps
# Check Kind is installed
kind version
# Create cluster manually
kind create cluster --name kecs-cluster
# Verify cluster
kubectl get nodesProblem: Cluster Already Exists
Symptoms:
cluster already existsSolution:
# List existing clusters
aws ecs list-clusters --endpoint-url http://localhost:8080
# Delete and recreate
aws ecs delete-cluster --cluster <name> --endpoint-url http://localhost:8080Service Deployment Issues
Problem: Service Won't Start
Symptoms:
- Service stuck in PENDING
- No running tasks
Solution:
Check task definition:
bashaws ecs describe-task-definition \ --task-definition <family:revision> \ --endpoint-url http://localhost:8080Check cluster resources:
bashkubectl top nodes kubectl describe nodesReview service events:
bashaws ecs describe-services \ --cluster <cluster> \ --services <service> \ --endpoint-url http://localhost:8080
Problem: Tasks Keep Stopping
Symptoms:
- Tasks transition to STOPPED
- Service can't maintain desired count
Solution:
Check task stop reason:
bashaws ecs describe-tasks \ --cluster <cluster> \ --tasks <task-arn> \ --endpoint-url http://localhost:8080 \ | jq '.tasks[0].stoppedReason'View container logs:
bashkubectl logs -n <cluster-name> <pod-name>Common causes:
- Image pull failures
- Health check failures
- Resource constraints
- Application errors
Task Issues
Problem: Image Pull Error
Symptoms:
CannotPullContainerError: Error response from daemon: pull access deniedSolution:
Verify image exists:
bashdocker pull <image-name>Check image registry credentials:
bash# For private registries kubectl create secret docker-registry regcred \ --docker-server=<registry> \ --docker-username=<username> \ --docker-password=<password> \ -n <cluster-name>Update task definition with credentials:
json{ "containerDefinitions": [{ "repositoryCredentials": { "credentialsParameter": "arn:aws:secretsmanager:region:account:secret:name" } }] }
Problem: Out of Memory
Symptoms:
OutOfMemoryError: Container killed due to memory limitSolution:
Increase memory limits:
json{ "memory": "1024", "memoryReservation": "512" }Check memory usage:
bashkubectl top pod -n <cluster-name>Optimize application memory usage
Networking Issues
Problem: Service Discovery Not Working
Symptoms:
- Services can't communicate
- DNS resolution fails
Solution:
Check service registration:
bashaws servicediscovery list-services \ --endpoint-url http://localhost:4566Test DNS resolution:
bashkubectl exec -n <namespace> <pod> -- nslookup <service-name>Verify network policies:
bashkubectl get networkpolicies -n <namespace>
Problem: Load Balancer Not Working
Symptoms:
- Can't access service externally
- Health checks failing
Solution:
Check service type:
bashkubectl get svc -n <namespace>Verify target health:
bashaws elbv2 describe-target-health \ --target-group-arn <arn> \ --endpoint-url http://localhost:4566Check security groups and ports
LocalStack Integration Issues
Problem: LocalStack Connection Failed
Symptoms:
Could not connect to the endpoint URL: "http://localhost:4566/"Solution:
Verify LocalStack is running:
bashdocker ps | grep localstack curl http://localhost:4566/_localstack/healthCheck KECS configuration:
yamllocalstack: enabled: true endpoint: http://localhost:4566Restart both services:
bashdocker-compose restart
Problem: AWS SDK Not Using LocalStack
Symptoms:
- Requests going to real AWS
- Authentication errors
Solution:
Check sidecar injection:
bashkubectl describe pod <pod> -n <namespace> | grep localstack-proxySet AWS endpoint explicitly:
pythonboto3.client('s3', endpoint_url='http://localhost:4566')Verify environment variables:
bashkubectl exec <pod> -n <namespace> -- env | grep AWS
Performance Issues
Problem: Slow API Responses
Solution:
Check resource usage:
bash# KECS server top -p $(pgrep kecs) # Database ls -la ~/.kecs/data/kecs.dbEnable performance metrics:
bashcurl http://localhost:8081/metricsOptimize database:
bash# Vacuum database sqlite3 ~/.kecs/data/kecs.db "VACUUM;"
Problem: High Memory Usage
Solution:
Check for memory leaks:
bashgo tool pprof http://localhost:8081/debug/pprof/heapLimit concurrent operations:
yamlserver: maxConcurrentRequests: 100Adjust cache settings:
yamlcache: maxSize: 1000 ttl: 5m
Advanced Debugging
Enable Verbose Logging
# All components
export KECS_LOG_LEVEL=trace
# Specific components
export KECS_API_LOG_LEVEL=debug
export KECS_STORAGE_LOG_LEVEL=trace
export KECS_K8S_LOG_LEVEL=debugTrace Requests
# Enable request tracing
curl -H "X-Debug-Trace: true" \
-X POST http://localhost:8080/v1/ListClusters \
-H "Content-Type: application/x-amz-json-1.1" \
-H "X-Amz-Target: AmazonEC2ContainerServiceV20141113.ListClusters" \
-d '{}'Database Inspection
# Open database
sqlite3 ~/.kecs/data/kecs.db
# List tables
.tables
# Check clusters
SELECT * FROM clusters;
# Check services
SELECT * FROM services WHERE cluster_arn = 'arn:...';Kubernetes Debugging
# Get all resources in namespace
kubectl get all -n <cluster-name>
# Describe problematic pod
kubectl describe pod <pod-name> -n <cluster-name>
# Get pod events
kubectl get events -n <cluster-name> --sort-by='.lastTimestamp'
# Debug container
kubectl debug -it <pod-name> -n <cluster-name> --image=busyboxGetting Help
Collect Diagnostic Information
Run the diagnostic script:
./scripts/collect-diagnostics.shThis collects:
- KECS logs
- Configuration files
- Kubernetes cluster state
- System information
Report Issues
When reporting issues, include:
Environment Details
- KECS version:
kecs version - OS:
uname -a - Kubernetes version:
kubectl version - Docker version:
docker version
- KECS version:
Steps to Reproduce
- Exact commands run
- Configuration files used
- Expected vs actual behavior
Logs and Errors
- KECS server logs
- Relevant Kubernetes events
- Error messages
Diagnostic Bundle
- Output from diagnostic script
Community Support
- GitHub Issues: github.com/nandemo-ya/kecs/issues
Prevention Tips
Regular Maintenance
Update Regularly
bashgit pull origin main make buildMonitor Resources
- Set up alerts for disk space
- Monitor memory usage
- Track API response times
Backup Data
bash# Backup database cp ~/.kecs/data/kecs.db ~/.kecs/data/kecs.db.backupClean Up Resources
bash# Remove stopped tasks kubectl delete pods -n <namespace> --field-selector=status.phase=Succeeded # Prune unused images docker image prune -a
Best Practices
Use Resource Limits
- Set appropriate CPU/memory limits
- Monitor actual usage
- Leave headroom for spikes
Enable Health Checks
- Configure liveness probes
- Set readiness probes
- Monitor health metrics
Plan for Failures
- Test failure scenarios
- Document recovery procedures
- Keep backups current
Stay Informed
- Read release notes
- Follow security advisories
- Join community discussions