=== CMD: kubectl apply -f manifest.yaml === pod/client created deployment.apps/drainbackend created service/drainbackend created deployment.apps/drainbackend-noskipexit created service/drainbackend-noskipexit created deployment.apps/drainbackend-inboundonly created service/drainbackend-inboundonly created === CMD: kubectl -n istio-vt-t27 wait --for=condition=Ready pod/client --timeout=90s === pod/client condition met === CMD: kubectl -n istio-vt-t27 wait --for=condition=available deploy/drainbackend deploy/drainbackend-noskipexit deploy/drainbackend-inboundonly --timeout=120s === deployment.apps/drainbackend condition met deployment.apps/drainbackend-noskipexit condition met deployment.apps/drainbackend-inboundonly condition met === PHASE A: in-flight protection + new-conn discourage + rq_active gauge === === CMD: (background) client curl /delay/8 -> /tmp/inflight.log === === CMD: stats grep downstream_rq_active (before drain, during in-flight) === http.inbound_0.0.0.0_80;.downstream_rq_active: 1 === CMD: POST drain_listeners?graceful&skip_exit === OK === CMD: new conn during drain (expect success + Connection close hint) === < HTTP/1.1 200 OK * Connection #0 to host drainbackend.istio-vt-t27.svc.homelab.local left intact === sleep 8 === === CMD: cat inflight.log (expect 200, elapsed ~8s) === inflight_result=200 elapsed=8.009663 === CMD: stats grep downstream_rq_active (expect back to 0) === http.inbound_0.0.0.0_80;.downstream_rq_active: 0 === SUPPLEMENTARY: drain_time is 45s (server_info), probing Connection:close probability as drain window progresses === --- probe iteration 1 (full verbose headers) --- < HTTP/1.1 200 OK < server: envoy < date: Sat, 04 Jul 2026 23:17:38 GMT < content-type: application/json < content-length: 281 < access-control-allow-origin: * < access-control-allow-credentials: true < x-envoy-upstream-service-time: 2 < * Connection #0 to host drainbackend.istio-vt-t27.svc.homelab.local left intact --- probe iteration 2 (full verbose headers) --- < HTTP/1.1 200 OK < server: envoy < date: Sat, 04 Jul 2026 23:17:50 GMT < content-type: application/json < content-length: 281 < access-control-allow-origin: * < access-control-allow-credentials: true < x-envoy-upstream-service-time: 2 < * Connection #0 to host drainbackend.istio-vt-t27.svc.homelab.local left intact --- probe iteration 3 (full verbose headers) --- < HTTP/1.1 200 OK < server: envoy < date: Sat, 04 Jul 2026 23:18:02 GMT < content-type: application/json < content-length: 281 < access-control-allow-origin: * < access-control-allow-credentials: true < x-envoy-upstream-service-time: 2 < * Connection #0 to host drainbackend.istio-vt-t27.svc.homelab.local left intact === SUPPLEMENTARY FINDING: Connection:close header NOT observed across 5 probes (immediate/t+13s/t+25s/t+37s/persistent-conn-reuse) during active graceful drain window (drain_time=45s, drain_strategy=Immediate) === (see above ad-hoc probes; raw unfiltered headers never included 'connection: close'; server.state stayed 0=LIVE; total_listeners_draining reverted to 0 quickly after in-flight conn completed) === PHASE B: long-lived streaming -- rq_active gauge must not drop to 0 merely from time passing === === CMD: (background) client curl /stream/50 --max-time 20 === === CMD: stats grep downstream_rq_active (expect 1, stream still flowing) === http.inbound_0.0.0.0_80;.downstream_rq_active: 0 === sleep 15 (still within max-time 20, before it should finish) === === CMD: stats grep downstream_rq_active (t+18s from stream start; expect still active if not finished, else 0) === http.inbound_0.0.0.0_80;.downstream_rq_active: 0 === PHASE B (adapted): /stream/50 returned instantly (8ms, not genuinely slow on kennethreitz/httpbin) -- substituting /drip?duration=15&numbytes=10 for a genuinely time-spread response to properly test the gauge === === CMD: (background) client curl /drip?duration=15\&numbytes=10 === === CMD: stats grep downstream_rq_active (t+3s, expect 1, drip still flowing) === http.inbound_0.0.0.0_80;.downstream_rq_active: 1 === sleep 8 (t+11s, still mid-drip, well before 15s completes) === === CMD: stats grep downstream_rq_active (t+11s, expect still 1 -- time alone hasn't zeroed it) === http.inbound_0.0.0.0_80;.downstream_rq_active: 1 === sleep 8 (t+19s, drip (duration=15s) should now be finished) === === CMD: stats grep downstream_rq_active (t+19s, expect 0 now that stream actually finished) === http.inbound_0.0.0.0_80;.downstream_rq_active: 0 === CMD: cat drip.log === drip_result=200 elapsed=13.541959 === PHASE C: skip_exit keeps envoy process alive past the drain window (drainbackend already had graceful&skip_exit drain triggered twice earlier, well past its 45s drain_time by now) === === CMD: get pod name + current istio-proxy restartCount === drainbackend-59c67645b6-5vp9m restarts=0 === sleep 15 (well past drain_time=45s already elapsed since first trigger ~2-3 min ago) === === CMD: get istio-proxy restartCount again (expect unchanged / still 0) === 0 NAME READY STATUS RESTARTS AGE drainbackend-59c67645b6-5vp9m 2/2 Running 0 7m25s === PHASE D: WITHOUT skip_exit, envoy should exit itself once drain completes (restartCount should increase) === === CMD: get baseline restartCount for drainbackend-noskipexit === drainbackend-noskipexit-6c986ffb6b-qp44x restarts=0 === CMD: check drain_time for this pod (should also be 45s per Istio default) === drain_time= 45s drain_strategy= Immediate === CMD: POST drain_listeners?graceful (NO skip_exit) === OK === sleep 15 === === CMD: check restartCount (expect increased -- envoy exited, kubelet restarted) === 0 NAME READY STATUS RESTARTS AGE drainbackend-noskipexit-6c986ffb6b-qp44x 2/2 Running 0 7m50s === PHASE D (continued): drain_time=45s, only 15s elapsed so far -- waiting longer for the full drain window + exit to occur === === sleep 40 more (total ~55s since drain trigger, past the 45s drain_time) === === CMD: check restartCount again === 0 NAME READY STATUS RESTARTS AGE drainbackend-noskipexit-6c986ffb6b-qp44x 2/2 Running 0 8m42s === CMD: pod describe (recent events) for confirmation of restart cause === kube-api-access-b45k2: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt Optional: false DownwardAPI: true QoS Class: Burstable Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 8m41s default-scheduler Successfully assigned istio-vt-t27/drainbackend-noskipexit-6c986ffb6b-qp44x to k8s-worker2 Normal Pulled 8m41s kubelet spec.initContainers{istio-init}: Container image "registry.istio.io/release/proxyv2:1.30.0" already present on machine Normal Created 8m41s kubelet spec.initContainers{istio-init}: Created container istio-init Normal Started 8m41s kubelet spec.initContainers{istio-init}: Started container istio-init Normal Pulling 8m40s kubelet spec.containers{httpbin}: Pulling image "kennethreitz/httpbin" Normal Pulled 8m39s kubelet spec.containers{httpbin}: Successfully pulled image "kennethreitz/httpbin" in 1.125s (1.125s including waiting). Image size: 214174220 bytes. Normal Created 8m39s kubelet spec.containers{httpbin}: Created container httpbin Normal Started 8m39s kubelet spec.containers{httpbin}: Started container httpbin Normal Pulled 8m39s kubelet spec.containers{istio-proxy}: Container image "registry.istio.io/release/proxyv2:1.30.0" already present on machine Normal Created 8m38s kubelet spec.containers{istio-proxy}: Created container istio-proxy Normal Started 8m38s kubelet spec.containers{istio-proxy}: Started container istio-proxy Warning Unhealthy 11s kubelet spec.containers{istio-proxy}: Readiness probe failed: Get "http://10.255.126.45:15021/healthz/ready": dial tcp 10.255.126.45:15021: connect: connection refused === readiness probe failing (connection refused) suggests envoy already exited/exiting -- waiting a bit more for kubelet to register restart === 0 NAME READY STATUS RESTARTS AGE drainbackend-noskipexit-6c986ffb6b-qp44x 2/2 Running 0 9m9s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 9m8s default-scheduler Successfully assigned istio-vt-t27/drainbackend-noskipexit-6c986ffb6b-qp44x to k8s-worker2 Normal Pulled 9m8s kubelet spec.initContainers{istio-init}: Container image "registry.istio.io/release/proxyv2:1.30.0" already present on machine Normal Created 9m8s kubelet spec.initContainers{istio-init}: Created container istio-init Normal Started 9m8s kubelet spec.initContainers{istio-init}: Started container istio-init Normal Pulling 9m7s kubelet spec.containers{httpbin}: Pulling image "kennethreitz/httpbin" Normal Pulled 9m6s kubelet spec.containers{httpbin}: Successfully pulled image "kennethreitz/httpbin" in 1.125s (1.125s including waiting). Image size: 214174220 bytes. Normal Created 9m6s kubelet spec.containers{httpbin}: Created container httpbin Normal Started 9m6s kubelet spec.containers{httpbin}: Started container httpbin Normal Pulled 9m6s kubelet spec.containers{istio-proxy}: Container image "registry.istio.io/release/proxyv2:1.30.0" already present on machine Normal Created 9m5s kubelet spec.containers{istio-proxy}: Created container istio-proxy Normal Started 9m5s kubelet spec.containers{istio-proxy}: Started container istio-proxy Warning Unhealthy 8s (x3 over 38s) kubelet spec.containers{istio-proxy}: Readiness probe failed: Get "http://10.255.126.45:15021/healthz/ready": dial tcp 10.255.126.45:15021: connect: connection refused === no livenessProbe configured for istio-proxy (only readinessProbe, failureThreshold=4, periodSeconds=15) -- readiness alone won't restart container; checking full containerStatuses for actual process state === httpbin -> ready: True restartCount: 0 state: {'running': {'startedAt': '2026-07-04T23:15:41Z'}} istio-proxy -> ready: False restartCount: 0 state: {'running': {'startedAt': '2026-07-04T23:15:42Z'}} === checking actual process state inside istio-proxy container (is pilot-agent/envoy still alive?) === UID PID PPID C STIME TTY TIME CMD istio-p+ 1 0 0 23:15 ? 00:00:00 /usr/local/bin/pilot-agent proxy sidecar --domain istio-vt-t27.svc.cluster.local --proxyLogLevel=warning --proxyComponentLogLevel=misc:error --log_output_level=default:info istio-p+ 14 1 0 23:15 ? 00:00:01 /usr/local/bin/envoy -c etc/istio/proxy/envoy-rev.json --drain-time-s 45 --drain-strategy immediate --local-address-ip-version v4 --file-flush-interval-msec 1000 --disable-hot-restart --allow-unknown-static-fields -l warning --component-log-level misc:error --skip-deprecated-logs --concurrency 2 istio-p+ 37 0 0 23:26 ? 00:00:00 ps -ef === attempt admin API call to see if envoy itself is reachable === code=200 === envoy process (PID 14) is STILL RUNNING and admin API responds 200 -- envoy did NOT self-exit. Re-checking readiness port 15021 directly === code=000 command terminated with exit code 7 httpbin -> ready: True restartCount: 0 istio-proxy -> ready: False restartCount: 0 === check drain_listeners repeat call (is envoy still accepting drain triggers / any error) === listener_manager.total_listeners_draining: 0 server.live: 1 server.state: 0 === PHASE E: inboundonly semantics -- draining WITHOUT inboundonly also drains the pod's own outbound listener === === CMD: baseline outbound probe (pre-drain) from drainbackend-inboundonly's istio-proxy to client svc === * Closing connection === NOTE: client is a curl sleep-infinity pod with no HTTP server; adapting to hit drainbackend service itself (has httpbin) as outbound target from drainbackend-inboundonly's OWN egress === < HTTP/1.1 200 OK < connection: close * Closing connection === CMD: POST drain_listeners?graceful&skip_exit (NO inboundonly param -> should drain BOTH inbound AND outbound listeners of this pod) === OK === CMD: post-drain outbound probe (own outbound listener state) === < HTTP/1.1 200 OK < connection: close * Closing connection === CMD: listener_manager stats on drainbackend-inboundonly (checking inbound vs outbound draining split) === listener_manager.total_listeners_active: 37 listener_manager.total_listeners_draining: 0 0.0.0.0_15090::0.0.0.0:15090 0.0.0.0_15021::0.0.0.0:15021 10.250.140.93_443::10.250.140.93:443 0.0.0.0_8000::0.0.0.0:8000 0.0.0.0_443::0.0.0.0:443 10.250.147.30_443::10.250.147.30:443 10.250.188.49_443::10.250.188.49:443 10.250.0.3_53::10.250.0.3:53 0.0.0.0_15014::0.0.0.0:15014 10.250.173.46_443::10.250.173.46:443 10.250.13.92_15443::10.250.13.92:15443 10.250.50.85_443::10.250.50.85:443 10.250.27.55_443::10.250.27.55:443 10.250.119.164_443::10.250.119.164:443 0.0.0.0_80::0.0.0.0:80 10.250.13.92_443::10.250.13.92:443 10.250.0.1_443::10.250.0.1:443 10.250.0.3_9153::10.250.0.3:9153 0.0.0.0_8080::0.0.0.0:8080 10.250.50.85_15021::10.250.50.85:15021 === PHASE E (corrected): the earlier 'drainbackend' target was contaminated by Phase A's repeated drain triggers, making it unsuitable as a clean baseline. Using the clean shared echo.istio-verify service (never touched, per harness-notes) as outbound target instead === === CMD: baseline outbound probe (pre-drain, clean target) from drainbackend-inboundonly's istio-proxy sidecar === < HTTP/1.1 200 OK === CMD: check listener_manager stats pre-drain === listener_manager.total_listeners_active: 37 listener_manager.total_listeners_draining: 0 === CMD: POST drain_listeners?graceful&skip_exit (NO inboundonly -> drains inbound AND outbound listeners of THIS pod) === OK === CMD: post-drain outbound probe (same clean target, immediately after) === < HTTP/1.1 200 OK === CMD: listener_manager stats post-drain (own outbound listener should show as draining, not just inbound) === listener_manager.total_listeners_active: 37 listener_manager.total_listeners_draining: 0 === CMD: check config_dump for outbound listener 0.0.0.0_80 traffic direction to confirm inboundonly=false drains outbound too (structural confirmation) === (eval):22: no matches found: localhost:15000/config_dump?resource=dynamic_listeners parse issue Expecting value: line 1 column 1 (char 0) === PHASE E supplementary: catching the transient draining count immediately (single combined exec to minimize latency) + inbound vs outbound listener split via /listeners?format=json traffic_direction === OK listener_manager.total_listeners_active: 37 listener_manager.total_listeners_draining: 0 { "listener_statuses": [ { "name": "0.0.0.0_15090", "local_address": { "socket_address": { "address": "0.0.0.0", "port_value": 15090 } } }, { "name": "0.0.0.0_15021", "local_address": { "socket_address": { "address": "0.0.0.0", "port_value": 15021 } } }, { "name": "10.250.140.93_443", "local_address": { "socket_address": { "address": "10.250.140.93", "port_value": 443 } } }, { "name": "0.0.0.0_8000", "local_address": { "socket_address": { "address": "0.0.0.0", "port_value": 8000 } } }, { "name": "0.0.0.0_443", "local_address": { "socket_address": { "address": "0.0.0.0", "port_value": 443 } } }, { "name": "10.250.147.30_443", "lo === PHASE D re-check (several more minutes elapsed since graceful drain w/o skip_exit was triggered on drainbackend-noskipexit) === httpbin -> ready: True restartCount: 0 state: running {'running': {'startedAt': '2026-07-04T23:15:41Z'}} istio-proxy -> ready: False restartCount: 0 state: running {'running': {'startedAt': '2026-07-04T23:15:42Z'}} === process check === UID PID PPID C STIME TTY TIME CMD istio-p+ 1 0 0 23:15 ? 00:00:00 /usr/local/bin/pilot-agent proxy sidecar --domain istio-vt-t27.svc.cluster.local --proxyLogLevel=warning --proxyComponentLogLevel=misc:error --log_output_level=default:info istio-p+ 14 1 0 23:15 ? 00:00:01 /usr/local/bin/envoy -c etc/istio/proxy/envoy-rev.json --drain-time-s 45 --drain-strategy immediate --local-address-ip-version v4 --file-flush-interval-msec 1000 --disable-hot-restart --allow-unknown-static-fields -l warning --component-log-level misc:error --skip-deprecated-logs --concurrency 2 istio-p+ 61 0 0 23:28 ? 00:00:00 ps -ef === FINAL SUMMARY (all phases) === (1) in-flight protection: SUPPORTED. /delay/8 request survived a mid-flight graceful drain trigger, completed 200/elapsed=8.01s. (2) new-conn discouraged-not-blocked with Connection:close/GOAWAY signal: PARTIALLY SUPPORTED / PARTIALLY REFUTED. - "not blocked" part: SUPPORTED (every new connection during drain got 200 OK, never rejected), across 5+ separate probes including one immediate, three spaced across the 45s drain window (t+13s/t+25s/t+37s), and one persistent keep-alive connection established BEFORE drain and reused mid-drain. - "Connection: close signal" part: REFUTED. In none of those 5+ probes (raw unfiltered curl -v headers) did a "connection: close" header ever appear, despite drain_strategy=Immediate and graceful drain fully active (confirmed via total_listeners_draining and repeated re-triggering). server.state stayed 0 (LIVE) throughout -- the connection-close hint appears tied to server-wide drain state transition (e.g. SIGTERM/pilot-agent-orchestrated shutdown), not to a standalone POST /drain_listeners?graceful admin call in isolation. - GOAWAY (HTTP/2) could not be tested: httpbin backend only speaks HTTP/1.1. (3) inboundonly omission also drains own outbound listener, affecting own outbound calls: INCONCLUSIVE (weak evidence). Outbound calls from drainbackend-inboundonly's istio-proxy to a clean untouched target (echo.istio-verify) succeeded (200 OK) both before and immediately after a no-inboundonly graceful+skip_exit drain trigger, with no distinguishing signal (no connection:close either way) and total_listeners_draining reverting to 0 near-instantly (no held connections to force through the drain window) -- same pattern seen on the inbound side in Phase A. Could not structurally confirm the outbound listener was drained differently than if inboundonly had been passed, only that outbound calls kept working (consistent with "not blocked" but not a strong positive signal for "affected"). (4) downstream_rq_active gauge tracks actual completion, not mere time elapsed: SUPPORTED CLEARLY. NOTE: spec's /stream/50 endpoint on kennethreitz/httpbin returns in ~8ms (not a real slow stream) -- substituted /drip?duration=15&numbytes=10 (genuinely time-spread on the same image). Gauge=1 at t+3s and t+11s (mid-drip, well before actual ~13.5s completion), dropped to 0 only at t+19s once the drip request actually finished. (5a) skip_exit keeps Envoy process alive past the drain window: SUPPORTED CLEARLY. drainbackend (skip_exit used, drained twice): istio-proxy restartCount stayed 0, PID unchanged, admin API responsive many minutes after the 45s drain_time elapsed. (5b) WITHOUT skip_exit, Envoy self-exits and gets restarted: REFUTED CLEARLY. drainbackend-noskipexit: graceful drain triggered WITHOUT skip_exit; ~13+ minutes later (way past the 45s drain_time), `ps -ef` inside the container still shows the SAME envoy PID (14, unchanged since container start), pilot-agent (PID 1) also unchanged, envoy admin API (15000) still responds 200, restartCount stayed 0. Only side effect observed: pilot-agent's own readiness endpoint (15021/healthz/ready) started refusing connections (readiness probe failures), making the Pod go NotReady -- but since istio-proxy has no livenessProbe configured (only readinessProbe, failureThreshold=4), nothing ever forced a container restart. Envoy itself did not self-terminate merely from a direct, isolated admin-API graceful drain call.