guideistio 2026-06-28istioenvoygraceful-terminationhaproxy

Quickstart — 5분 안에 실험 다시 돌리기

ABSTRACT

홈랩 Istio graceful-termination 실험을 다시 펼쳐 빠르게 재현하기 위한 명령 시퀀스 + 검증 모음이다. 핵심 결론 한 줄: 모든 재현은 모드를 토글하고 → 종료 이벤트를 만들고 → 그 순간의 in-flight 요청 결과를 측정하는 동일한 3-스텝 루프이며, current↔improved의 expect 라인 차이가 곧 가설의 증거다. 메커니즘 설명은 big picture에, 시나리오 정본 정의(S2 포함)는 test scenarios에 있다.

대상환경: homelab k8s v1.30.6 (master1/worker1/worker2) + .211 노드 HAProxy + Istio IGW. 대상독자: 이 실험을 이미 한 번 돌려봤고 다시 펼치려는 사람. 범위: 재현 명령·검증·함정만 — 왜 이런 결과가 나오는지의 메커니즘은 링크된 문서로 분리. 선행개념: Envoy drain, k8s preStop/grace-period, HAProxy backend health.

0. 왜 이 실험이 존재하나 — 풀어야 할 문제

쿠버네티스에서 pod가 사라지는 건 일상이다(rollout, scale-down, eviction). 문제는 pod가 죽는 그 짧은 창(window) 동안 이미 그 pod로 들어와 처리 중이던 요청(in-flight request)에 무슨 일이 일어나는가이다. 순진하게 종료하면 진행 중 요청이 RST(TCP reset)나 stream CANCEL로 끊겨 클라이언트는 5xx·exit≠0을 본다. 이게 “ungraceful termination"이고, 트래픽이 많은 프로덕션에서는 배포할 때마다 소수의 요청이 조용히 깨지는 형태로 새어 나온다.

graceful termination의 처방은 단순하다 — 죽기 전에 “나 이제 안 받아요"를 먼저 알리고(drain), 받은 요청은 끝까지 처리할 시간을 확보한 뒤(지연 종료) 진짜로 죽는다. 이 실험은 그 처방이 실제로 효과가 있는지를 두 모드로 대조 측정한다:

모드	동작	in-flight 요청 운명
current (broken)	abrupt shutdown	RST/CANCEL로 끊김 → 5xx·exit≠0
improved (graceful)	drain + 지연 종료	끝까지 완주 → 200·exit=0

이 문서는 그 대조를 “어떤 명령을, 어떤 순서로, 무엇을 기대하며” 돌리는지로 압축한 것이다. 즉 이 문서는 결론을 만드는 절차서이고, 왜 그 결론이 나오는가는 big picture·test scenarios에 있다.

1. 머릿속 한 장 — 모든 재현을 지배하는 3-스텝 루프

ANCHOR: 이 실험에 시나리오가 셋이지만 골격은 하나다 — 토글 → 종료 이벤트 → 측정. 시나리오들은 이 루프의 변수만 바꾼다: ① 종료 이벤트의 종류(단일 pod kill vs rollout), ② 요청의 형태(단발 long-request vs 지속 트래픽 vs streaming). 그래서 한 시나리오를 이해하면 나머지는 델타만 보면 된다.

flowchart LR
  toggle["1. TOGGLE<br/>current ↔ improved"] --> inflight["2a. start in-flight<br/>request(s)"]
  inflight --> kill["2b. kill / rollout<br/>(종료 이벤트)"]
  kill --> measure["3. MEASURE<br/>http_code · exit · chunks"]
  measure --> verdict{"current vs improved<br/>expect 라인 차이?"}
  verdict -->|차이 있음| proven["가설 입증"]

왜 이 구조여야 하나 — 측정하려는 건 “종료 순간에 걸쳐 있던 요청"의 운명이다. 그래서 요청을 먼저 띄워 in-flight 상태로 만든 뒤(2a), 그 다음에 pod를 죽이고(2b), 죽는 도중에 그 요청이 어떻게 끝나는지를 본다(3). 순서가 바뀌면(예: 죽이고 나서 요청) 측정 대상이 사라진다. 그리고 current와 improved의 expect 라인이 다르게 나와야 처방이 작동한다는 증거가 된다 — 같으면 실험이 망가진 것이다.

변수 격리: replicas가 측정의 신뢰성을 좌우한다

루프에 숨은 전제가 하나 있다 — 내가 죽이는 그 pod로 트래픽이 실제로 가야 측정이 성립한다. HAProxy가 balance roundrobin이라, replicas=2면 curl이 살아있는 다른 worker pod로 돌아가 정상 응답을 받아버린다(개입한 변수가 사라짐 → 가설 검증 불가, §5 Q1). 그래서:

S1/S4 (replicas=1): 트래픽이 죽일 pod 하나로 강제 → 종료 영향을 정면으로 측정.
S3 (replicas=2): 의도적 — rollout disruption(여러 pod가 순차 교체되는 동안의 연결 안정성)을 측정하려면 복수 pod가 필요.

2. 메커니즘 — 모드 전환은 왜 “3개가 함께” 움직이나

전환의 한 줄 모델: 한 모드 = {IGW manifest, HAProxy cfg, mode 라벨} 세 가지가 정합된 상태이고, 토글은 이 셋을 동시에 갈아끼우는 일이다. 하나라도 빠지면 상태가 어긋나 결과가 오염된다.

flowchart LR
  op[operator] --> k8s["K8s: apply 20-current<br/>or 21-improved IGW"]
  op --> ha["node .211: install + reload<br/>haproxy cfg (current/improved)"]
  op --> del["delete old-mode pods<br/>grace-period=5"]
  k8s --> done[rollout status OK]
  ha --> done
  del --> done

대상	current	improved	왜 바꿔야 하나
IGW manifest	`manifests/20-igw-current.yaml`	`manifests/21-igw-improved.yaml`	pod의 preStop/drain/grace 동작 자체를 정의 — 처방의 본체
HAProxy cfg (.211)	`haproxy/haproxy-current.cfg`	`haproxy/haproxy-improved.cfg`	L7 앞단의 health-check·연결 처리. pod 동작과 정합돼야 종료가 깔끔
옛 pod	`mode!=current` 강제 삭제	`mode!=improved` 강제 삭제	새 manifest를 apply해도 옛 mode pod이 deadlock으로 안 죽으면(§4) 옛 동작이 잔존

왜 옛 pod을 손으로 죽여야 하나 — 가장 비자명한 부분이다. anti-affinity required + maxUnavailable=0 + N(pod)=N(nodes)이면, 새 RS pod은 좌석이 없어 Pending, 옛 RS pod은 maxUnavailable=0이라 종료 불가 → deadlock(§5 Q3). --grace-period=5 강제 삭제로 좌석을 비워 rollout을 풀어준다.

# improved 모드로
kubectl --context homelab apply -f manifests/21-igw-improved.yaml
scp haproxy/haproxy-improved.cfg homelab:/tmp/haproxy.cfg
ssh homelab "scp /tmp/haproxy.cfg [email protected]:/tmp/ && \
  ssh [email protected] 'sudo install -m 0644 /tmp/haproxy.cfg /etc/haproxy/haproxy.cfg && sudo systemctl reload haproxy'"

# current 모드로 (반대) — improved와 동일 패턴, cfg/manifest만 교체
kubectl --context homelab apply -f manifests/20-igw-current.yaml
scp haproxy/haproxy-current.cfg homelab:/tmp/haproxy.cfg
ssh homelab "scp /tmp/haproxy.cfg [email protected]:/tmp/ && \
  ssh [email protected] 'sudo install -m 0644 /tmp/haproxy.cfg /etc/haproxy/haproxy.cfg && sudo systemctl reload haproxy'"

# 옛 mode pod 강제 종료 (anti-affinity deadlock 회피)
kubectl --context homelab -n service-a delete pod \
  $(kubectl --context homelab -n service-a get pod -l app=service-a-igw \
    -o jsonpath='{range .items[?(@.metadata.labels.mode!="<TARGET_MODE>")]}{.metadata.name} {end}') \
  --grace-period=5

# rollout 완료 대기
kubectl --context homelab -n service-a rollout status deploy/service-a-igw --timeout=180s

reload ≠ restart: HAProxy가 6443 control-plane frontend도 함께 물고 있어 restart는 접속을 일시 끊는다. cfg 변경은 항상 reload(§4 마지막 행).

사전 점검 (30초) — 루프를 돌리기 전 4가지 전제

이 실험은 클러스터·istiod·HAProxy backend·이미지 4가지가 정상이라는 전제 위에 선다. 하나라도 빠지면 측정이 오염되므로 루프 전에 확인한다.

# 클러스터 살아있나
kubectl --context homelab get nodes -o wide
# expect: master1/worker1/worker2 모두 Ready, v1.30.6

# istiod 떠있나
kubectl --context homelab -n istio-system get pod
# expect: istiod-* 1/1 Running

# HAProxy backend 정상인가
ssh homelab "ssh [email protected] 'echo show stat | sudo socat /run/haproxy/admin.sock stdio'" \
  | awk -F, '/istio-http-backend/{print $2"="$18}'
# expect: master1=UP, worker1=UP, worker2=UP (또는 pod 분포에 따라 일부 DOWN)

# 이미지 3 노드에 모두 있나
for n in 212 213 214; do
  ssh homelab "ssh [email protected].$n 'sudo crictl images 2>/dev/null | grep service-a'" | head -2
done
# expect: service-a-backend:dev, service-a-hc:dev 둘 다 노드별로 표시

이 중 하나 빠지면 runbook의 복구 절차 참조.

3. 시나리오 재현 — 같은 루프, 다른 델타

세 시나리오는 §1 루프의 인스턴스다. 아래는 각각의 델타(종료 이벤트 × 요청 형태 × replicas)와 함께, apply 그대로의 완전한 명령을 싣는다. S2는 S1의 improved drain 폴링 FSM 타이밍 검증판이라 S1 expect 라인에 흡수되어 별도 블록 없이 test scenarios를 참조한다.

시나리오	종료 이벤트	요청 형태	replicas	측정값
S1	단일 pod kill	단발 long-request	1	http_code · time_total
S3	rollout restart	지속 트래픽(10 worker)	2	5xx 수 · errors(RST)
S4	단일 pod kill	HTTP/2 streaming	1	chunks 수 · exit

S1 — current long-request RST 재현 (replicas=1)

요청 하나(/sleep?seconds=60)를 띄워 60초간 in-flight로 잡아두고, 5초 뒤 그 pod을 죽인다. current면 진행 중 요청이 502로 끊기고(t=~8.25s), improved면 끝까지 완주한다(http=200, t=~60s).

kubectl --context homelab -n service-a scale deploy/service-a-igw --replicas=1
# … rollout status 대기 …

ART=tests/artifacts/$(date +%Y%m%d-%H%M%S)/S1-rerun && mkdir -p $ART
TARGET=$(kubectl --context homelab -n service-a get pod -l app=service-a-igw -o name | head -1 | cut -d/ -f2)
kubectl --context homelab -n service-a logs $TARGET -c hc --follow --tail=0 > $ART/hc.log 2>&1 &
HC_PID=$!
(curl -sS --max-time 90 --no-buffer --cacert haproxy/certs/ca.pem \
   --resolve example.local:443:203.0.113.211 \
   -w '\n---\nhttp=%{http_code} t=%{time_total}\n' \
   'https://example.local/sleep?seconds=60' > $ART/curl.out; echo "exit=$?" >> $ART/curl.out) &
CURL_PID=$!
sleep 5
kubectl --context homelab -n service-a delete pod $TARGET --grace-period=210 --wait=false
wait $CURL_PID
kill $HC_PID 2>/dev/null
cat $ART/curl.out
# expect: http=502 t=~8.25s exit=0  (current 모드, S1 실측치)
# expect: http=200 t=~60s exit=0 (improved 모드)

S3 — continuous + rollout restart (replicas=2)

10개의 worker loop가 90초간 /fast를 두드리는 중에 rollout restart로 pod 전체를 굴린다. 단일 kill이 아니라 배포 도중 연결이 새지 않는가를 본다. current는 RST로 ~9건 errors, improved는 0건.

kubectl --context homelab -n service-a scale deploy/service-a-igw --replicas=2
# … rollout 대기 …

ART=tests/artifacts/$(date +%Y%m%d-%H%M%S)/S3-rerun && mkdir -p $ART
T_END=$(($(date +%s) + 90))
for i in $(seq 1 10); do
  (while [ $(date +%s) -lt $T_END ]; do
    curl -sS --cacert haproxy/certs/ca.pem --resolve example.local:443:203.0.113.211 \
      -o /dev/null -w "$(date +%s.%3N) %{http_code} %{time_total}\n" \
      https://example.local/fast 2>>$ART/curl-err.log
  done) >> $ART/curl.tsv &
done
sleep 10
kubectl --context homelab -n service-a rollout restart deploy/service-a-igw
wait

awk '$2~/^[0-9]+$/{c[$2]++} END{for(k in c) print k": "c[k]}' $ART/curl.tsv | sort
echo "errors: $(wc -l < $ART/curl-err.log)"
# current expect: 5xx=0, errors=~9 (connection RST)
# improved expect: 5xx=0, errors=0

S3에서 5xx=0인데 errors가 있는 게 핵심 단서다 — HTTP 응답 코드가 아니라 연결 자체가 RST로 끊겨 curl이 에러를 뱉는다. graceful이면 이 RST가 사라진다.

S4 — streaming (replicas=1)

HTTP/2 stream(/stream?seconds=60&interval=1, 초당 1 chunk)을 받는 도중 8초 뒤 pod을 죽인다. current면 stream이 chunk 11/12 즈음 CANCEL되어 exit=92, improved면 60 chunk 거의 다 받고 exit=0.

# 전제: replicas=1 (개입한 pod에 traffic이 가야 chunks=~12/exit=92 재현됨; 이유는 §5 Q1 /
#   [test scenarios](/public/istio/gt__src-w5-test-scenarios.html)). S3가 replicas=2로 끝났으므로 반드시 1로 되돌린다.
kubectl --context homelab -n service-a scale deploy/service-a-igw --replicas=1
kubectl --context homelab -n service-a rollout status deploy/service-a-igw --timeout=180s

ART=tests/artifacts/$(date +%Y%m%d-%H%M%S)/S4-rerun && mkdir -p $ART
TARGET=$(kubectl --context homelab -n service-a get pod -l app=service-a-igw -o name | head -1 | cut -d/ -f2)
(curl -sS --max-time 90 --no-buffer --cacert haproxy/certs/ca.pem \
   --resolve example.local:443:203.0.113.211 \
   'https://example.local/stream?seconds=60&interval=1' > $ART/curl.body 2>$ART/curl.err
 echo "exit=$?" > $ART/curl.exit) &
sleep 8
kubectl --context homelab -n service-a delete pod $TARGET --grace-period=210 --wait=false
wait

grep -c '^chunk' $ART/curl.body
cat $ART/curl.exit $ART/curl.err
# current expect: chunks=~12, exit=92 (HTTP/2 stream CANCEL @ chunk 11/12s)
# improved expect: chunks=59/60, exit=0

4. 결과 위치 + 분석 — 무엇을 어디서 읽나

artifact는 시나리오별 디렉토리에 모인다. 클라이언트 측(curl.*)이 결과, hc.log/envoy.log/stat.csv가 왜 그 결과가 나왔는지의 타임라인이다.

tests/artifacts/<YYYYMMDD-HHMMSS>/<scenario>/
  ├── curl.out / curl.body / curl.err      ← 클라이언트 측 결과
  ├── hc.log                               ← FSM 전이 (event=transition 라인 grep)
  ├── envoy.log                            ← Envoy access log + drain 라인
  ├── stat.csv / stat-timeline.csv         ← HAProxy show stat 시계열
  └── run.log                              ← 시나리오 실행 타임라인

정렬의 닻은 hc.log의 transition timestamp다. 6 events 통합 스크립트가 가장 먼저 보는 컬럼이 hc.log의 event=transition from=... to=... reason=... 라인 timestamp이고(§5 Q2), 그게 event 1(health_fail)의 시작점이다. 나머지 envoy.log·stat.csv는 이 timestamp 기준으로 정렬된다.

# FSM 전이만
grep transition <ART>/hc.log

# HAProxy backend status 변화
awk -F, '$2!=""{c[$2","$3]++} END{for(k in c) print k": "c[k]}' <ART>/stat.csv

# 첫 503 시점
grep -m1 503 <ART>/hc.log

# 6 events 통합 (artifacts dir 통째로)
bash tests/05-collect-timestamps.sh <ART>

5. 자주 마주치는 문제

증상	원인	해결
`kubectl rollout status` timeout	anti-affinity required + maxUnavailable=0 + N=N nodes deadlock	옛 mode pod 강제 삭제: `kubectl delete pod -l app=service-a-igw,mode=<old>` (메커니즘 심화는 runbook)
Pod `0/2 Running` Envoy SDS 에러	`workload-spiffe-uds` emptyDir 누락	`manifests/2X-igw-*.yaml` 의 volumes/volumeMounts 확인 — workload-socket/credential-socket/workload-certs 3종 필요
`ErrImagePull` from `203.0.113.2:5000`	노드 containerd가 HTTPS로 시도	`ssh homelab 'docker save ... && scp + ctr -n k8s.io image import'` 로 사이드로딩
HAProxy backend `master1=DOWN check=L4TOUT`	master1에 IGW pod 없거나, pod의 hc 컨테이너 not ready	EndpointSlice ready 상태 확인 → 이미지 pull 또는 readinessProbe 확인
HAProxy 6443 일시 끊김	`systemctl restart haproxy` 시 모든 frontend 재시작	`systemctl reload haproxy` 사용 (drop-in 변경 시) — 본 실험은 reload로 충분

6. 회상 quiz

Q1. S1을 replicas=2로 돌리면 결과가 어떻게 달라지나?

curl traffic이 HAProxy balance roundrobin으로 다른 worker pod에 갈 수 있음 → 그 worker pod이 살아있으니 정상 응답. 개입한 변수가 사라져 가설 검증 불가. 그래서 S1·S2·S4는 replicas=1.

Q2. `tests/05-collect-timestamps.sh`가 가장 먼저 보는 컬럼은?

hc.log의 event=transition from=... to=... reason=... 라인의 timestamp. 이게 6 events의 시작점(event 1: health_fail). 그 다음 envoy.log, stat.csv 등에서 같은 timestamp 기준으로 정렬.

Q3. 모드 전환 시 옛 pod이 안 죽는 패턴은?

anti-affinity required + maxUnavailable=0 + N=N nodes. 새 RS pod이 좌석 없어 Pending, 옛 RS pod도 maxUnavailable=0이라 종료 못 함 → deadlock. 해소: master1 untaint(좌석 +1) 또는 옛 pod 강제 삭제 또는 manifest의 maxUnavailable=1로 변경.

핵심 정리

모든 재현은 한 루프다 — 토글 → 종료 이벤트 → in-flight 측정. 시나리오는 (종료 이벤트 × 요청 형태 × replicas)의 델타만 다르다.
모드 전환 = 3개 정합 — IGW manifest(20/21) + HAProxy cfg 교체 + 옛 mode pod 강제 삭제(§5 표). 하나 빠지면 옛 동작 잔존.
replicas는 변수 격리 장치 — S1/S4는 죽일 pod에 트래픽을 강제하려 replicas=1, S3는 rollout disruption 측정용 replicas=2.
current↔improved expect 차이가 곧 증거 — S1: 502/8.25s↔200/60s, S3: errors~~9↔0, S4: chunks~~12·exit92↔59·exit0.
결과 정렬의 닻 — tests/artifacts/<ts>/<scenario>/에 모이며 hc.log의 transition timestamp가 6 events 정렬 기준점.
시나리오 정본 정의(S2 포함)는 test scenarios, 복구·deadlock 메커니즘 심화는 runbook.

What you might be missing

S2가 본문에 없는 건 의도다. S2는 S1의 improved drain 폴링 FSM 타이밍 검증판이라 S1 expect 라인(http=200/t=~60s)에 흡수된다. 별도 재현이 필요하면 test scenarios를 따라간다.
replicas 되돌리기를 잊으면 재현이 깨진다. S3(replicas=2) 직후 S4를 그대로 돌리면 traffic이 살아있는 다른 pod로 가 chunks=~12/exit=92가 안 나온다 — S4 블록 첫 줄의 scale --replicas=1이 그래서 필수다.
reload vs restart. HAProxy는 6443 frontend도 함께 물고 있어 restart는 control-plane 접속을 일시 끊는다. cfg 변경은 항상 reload로(§5 마지막 행).
5xx만 보면 graceful 실패를 놓친다. S3에서 5xx=0이어도 RST는 연결 레이어에서 발생해 curl errors로만 잡힌다 — graceful 검증은 http code뿐 아니라 exit≠0/연결 에러까지 봐야 한다.
deadlock 해소는 강제 삭제만이 아니다. master1 untaint로 좌석을 +1 하거나 manifest의 maxUnavailable=1로 바꾸는 우회도 있다(§6 Q3).

Files

Raw Markdown (index.md)