=== T56 RE-RUN (attempt 2) — namespace: istio-vt-t56-r2 === === CMD1: apply manifest.yaml (client+echo+svc+port-exhaust-client with pod-level securityContext.sysctls) === (applied above; wait Ready confirmed 2/2 for port-exhaust-client and client, deployment/echo Available) === CORRECTED CMD2: verify pod-wide ip_local_port_range set via securityContext.sysctls (no runtime sysctl -w needed) === -- curl container -- 32768 32787 -- istio-proxy container (shared netns) -- 32768 32787 === CMD3: BEFORE snapshot — outbound cluster stats for echo.istio-vt-t56-r2.svc.cluster.local (Istio registry name; corrected from spec's homelab.local grep) === (no lines above == confirmed empty; investigated further below) === DIAG: /stats plaintext endpoint does NOT expose per-cluster outbound connection counters by default on this cluster (Istio built-in stats inclusion matcher) === -- count of ANY 'cluster.' stat lines other than the istiod xds-grpc bootstrap cluster -- cluster.xds-grpc -- mesh ConfigMap has no custom proxyStatsMatcher override (this is Istio's stock default filtering, not a cluster-specific misconfig) -- 0 === SUBSTITUTE observation channel: /clusters admin endpoint exposes the SAME underlying Envoy cx_connect_fail/cx_total counters (not subject to the stats-inclusion matcher) — used as ground truth for before/after comparison === -- BEFORE snapshot (outbound|80||echo.istio-vt-t56-r2.svc.cluster.local) -- outbound|80||echo.istio-vt-t56-r2.svc.cluster.local::10.255.126.15:8080::cx_active::0 outbound|80||echo.istio-vt-t56-r2.svc.cluster.local::10.255.126.15:8080::cx_connect_fail::0 outbound|80||echo.istio-vt-t56-r2.svc.cluster.local::10.255.126.15:8080::cx_total::0 === CMD3 (spec-literal, cluster.local corrected): /stats grep before === (empty expected per DIAG above) === CMD4 (CORRECTED): 200x short-lived curl (Connection: close) against SHORT name http://echo.istio-vt-t56-r2/ (per corrected rule: k8s service curls use short names, which match Envoy vhost domains regardless of cluster DNS domain; the spec's *.svc.homelab.local literal FQDN would fall through to PassthroughCluster as proven in run 1) === ok=200 fail=0 kubectl -n $NS exec port-exhaust-client -c curl -- sh -c 0.06s user 0.02s system 4% cpu 1.583 total === DIAG: kernel tcp_tw_reuse / tcp_fin_timeout in pod netns (affects whether TIME_WAIT sockets can be immediately reused for NEW outbound connects) === tcp_tw_reuse=2 tcp_fin_timeout=60 === CMD5: TIME_WAIT socket count right after the 200x loop === 14 === DIAG: TIME_WAIT entries breakdown — note the narrowed 20-port budget (32768-32787) is shared POD-WIDE across ALL sidecar traffic (client->envoy loopback :15001, envoy->echo upstream, envoy->istiod xds :15012, envoy admin :15000), not just the echo request path === Recv-Q Send-Q Local Address:Port Peer Address:Port Process 0 0 10.255.194.70:15021 192.168.0.213:39168 0 0 10.255.194.70:15021 192.168.0.213:60662 0 0 127.0.0.1:15001 10.255.194.70:32768 0 0 127.0.0.1:15001 10.255.194.70:32778 0 0 10.255.194.70:15021 192.168.0.213:35658 0 0 127.0.0.1:15001 10.255.194.70:32782 0 0 127.0.0.1:15001 10.255.194.70:32774 0 0 127.0.0.1:15001 10.255.194.70:32776 0 0 127.0.0.1:15001 10.255.194.70:32784 0 0 127.0.0.1:15001 10.255.194.70:32770 0 0 127.0.0.1:15001 10.255.194.70:32786 0 0 127.0.0.1:15001 10.255.194.70:32780 0 0 127.0.0.1:15001 10.255.194.70:32772 === DIAG: sequential single-threaded loop produced 0 failures because tcp_tw_reuse=2 lets the kernel immediately reuse a TIME_WAIT 4-tuple for a NEW connect() to the SAME destination once timestamps allow it — with only 1 connection in flight at a time (sequential loop), 20 ports is never actually contended. Testing whether the underlying mechanism (narrow port range -> exhaustion -> connect fail) can manifest AT ALL on this node under true concurrency (which forces >20 simultaneous non-reusable sockets, regardless of tw_reuse) === outbound|80||echo.istio-vt-t56-r2.svc.cluster.local::10.255.126.15:8080::cx_connect_fail::0 outbound|80||echo.istio-vt-t56-r2.svc.cluster.local::10.255.126.15:8080::cx_total::2 -- launching 60 CONCURRENT short-lived requests (background jobs) against http://echo.istio-vt-t56-r2/ -- 200 0 200 0 200 0 200 0 200 0 200 0 200 0 200 0 200 0 200 0 200 0 200 0 200 0 200 0 200 0 200 0 200 0 200 0 200 0 000 7 000 7 000 7 000 7 CURLFAIL 000 7 CURLFAIL 200 0 200 0 200 0 000 7 CURLFAIL CURLFAIL 000 7 200 0 200 0 000 7 200 0 CURLFAIL CURLFAIL CURLFAIL CURLFAIL 000 7 000 7 000 7 200 0 CURLFAIL 200 0 200 0 200 0 CURLFAIL CURLFAIL 200 0 200 0 200 0 200 0 000 7 200 0 200 0 200 0 CURLFAIL 200 0 200 0 200 0 200 0 200 0 200 0 200 0 200 0 200 0 200 0 200 0 200 0 -- summary -- 48 24 === DIAG burst result: 48 succeeded (200), 12 curl exit=7 (Failed to connect to host) — confirms port+TIME_WAIT exhaustion DOES occur once true concurrency exceeds the 20-port pod-wide budget. (NOTE: initial eyeball count below miscounted as 24 because each curl failure prints TWO lines — "000 7" from -w AND "CURLFAIL" from the shell fallback; corrected count of 12 is verified precisely further below via `grep -c "^000"` / `grep -c "^CURLFAIL"`, each = 12.) === === AFTER burst: /clusters cx_connect_fail / cx_total for outbound|80||echo === outbound|80||echo.istio-vt-t56-r2.svc.cluster.local::10.255.126.15:8080::cx_connect_fail::0 outbound|80||echo.istio-vt-t56-r2.svc.cluster.local::10.255.126.15:8080::cx_total::17 === TIME_WAIT count right after burst === 25 === access log UF count (last 2m) === 0 === sample access log lines showing UF / upstream_reset_before_response_started === === KEY MECHANISTIC FINDING: cx_total only rose 2->17 (delta 15, persistent/keep-alive pooled upstream connections reused across 48 successful HTTP 200s) while cx_connect_fail stayed 0 and NO UF appeared, despite 24/72 concurrent curl attempts failing with exit=7 (connect failed). Checking whether Envoy's access log even RECEIVED those 24 failed requests at all (i.e. did the failures happen client-side on the loopback leg to the sidecar itself, before Envoy's proxy logic ever saw them?) === -- total access log lines in the observation window -- 248 -- total client-side requests attempted so far (200 sequential + 60 concurrent = 260) -- 260 -- if access log line count << 260, the failed connects never reached Envoy's HCM/access-log layer (i.e. failed on the client->sidecar loopback leg, sharing the SAME narrowed port pool as Envoy's own upstream conns, health checks (15021), and xds stream (15012)) -- === CORRECTED CMD5/CMD7 (final): TIME_WAIT count + UF/response-flag scan across whole observation window === -- burst breakdown (recount): 48 succeeded (200), 12 genuine curl connect failures (exit=7), confirmed by 2 lines/failure (000+CURLFAIL) -- 48 success / 12 fail out of 60 concurrent attempts -- access log total lines vs total attempts: 248 logged vs 260 attempted (200 sequential + 60 concurrent) => exactly the 12 failed attempts are MISSING from Envoy's access log entirely -- -- scan for ANY non-'-' response flag (UF, URX, UO, UH, etc.) in full observation window -- [2026-07-05T03:09:14.617Z] "GET / HTTP/1.1" 200 - via_upstream - "-" 0 761 12 12 "-" "curl/8.14.1" "da67597a-4939-434d-bd42-5c87a6f74242" "echo.istio-vt-t56-r2" "10.255.126.15:8080" outbound|80||echo.istio-vt-t56-r2.svc.cluster.local 10.255.194.70:32774 10.250.222.223:80 10.255.194.70:32772 - default [2026-07-05T03:09:14.637Z] "GET / HTTP/1.1" 200 - via_upstream - "-" 0 761 2 2 "-" "curl/8.14.1" "6de6d141-b0c0-4b37-b9e9-3c9d4d0bd30f" "echo.istio-vt-t56-r2" "10.255.126.15:8080" outbound|80||echo.istio-vt-t56-r2.svc.cluster.local 10.255.194.70:32774 10.250.222.223:80 10.255.194.70:32774 - default [2026-07-05T03:09:14.646Z] "GET / HTTP/1.1" 200 - via_upstream - "-" 0 761 2 2 "-" "curl/8.14.1" "4abb4787-2e32-4540-87b8-ed5d729bdfa3" "echo.istio-vt-t56-r2" "10.255.126.15:8080" outbound|80||echo.istio-vt-t56-r2.svc.cluster.local 10.255.194.70:32774 10.250.222.223:80 10.255.194.70:32770 - default [2026-07-05T03:09:14.654Z] "GET / HTTP/1.1" 200 - via_upstream - "-" 0 761 2 2 "-" "curl/8.14.1" "79355d01-0048-4be9-b700-f20521df5101" "echo.istio-vt-t56-r2" "10.255.126.15:8080" outbound|80||echo.istio-vt-t56-r2.svc.cluster.local 10.255.194.70:32774 10.250.222.223:80 10.255.194.70:32778 - default [2026-07-05T03:09:14.662Z] "GET / HTTP/1.1" 200 - via_upstream - "-" 0 761 2 2 "-" "curl/8.14.1" "4f213bc3-5f43-45a3-a181-ece6e9b99850" "echo.istio-vt-t56-r2" "10.255.126.15:8080" outbound|80||echo.istio-vt-t56-r2.svc.cluster.local 10.255.194.70:32774 10.250.222.223:80 10.255.194.70:32774 - default [2026-07-05T03:09:14.670Z] "GET / HTTP/1.1" 200 - via_upstream - "-" 0 761 2 2 "-" "curl/8.14.1" "fef629de-4f51-42da-a1ed-935c6921ff61" "echo.istio-vt-t56-r2" "10.255.126.15:8080" outbound|80||echo.istio-vt-t56-r2.svc.cluster.local 10.255.194.70:32774 10.250.222.223:80 10.255.194.70:32770 - default [2026-07-05T03:09:14.678Z] "GET / HTTP/1.1" 200 - via_upstream - "-" 0 761 6 6 "-" "curl/8.14.1" "9aad8d07-2022-494e-92a2-08090d128701" "echo.istio-vt-t56-r2" "10.255.126.15:8080" outbound|80||echo.istio-vt-t56-r2.svc.cluster.local 10.255.194.70:32782 10.250.222.223:80 10.255.194.70:32786 - default [2026-07-05T03:09:14.692Z] "GET / HTTP/1.1" 200 - via_upstream - "-" 0 761 2 2 "-" "curl/8.14.1" "4d9411aa-f288-42cb-86c9-ebedebc39e51" "echo.istio-vt-t56-r2" "10.255.126.15:8080" outbound|80||echo.istio-vt-t56-r2.svc.cluster.local 10.255.194.70:32774 10.250.222.223:80 10.255.194.70:32770 - default [2026-07-05T03:09:14.700Z] "GET / HTTP/1.1" 200 - via_upstream - "-" 0 761 2 2 "-" "curl/8.14.1" "e873bf7c-b1a0-45f7-994d-00c578460dbc" "echo.istio-vt-t56-r2" "10.255.126.15:8080" outbound|80||echo.istio-vt-t56-r2.svc.cluster.local 10.255.194.70:32782 10.250.222.223:80 10.255.194.70:32776 - default [2026-07-05T03:09:14.708Z] "GET / HTTP/1.1" 200 - via_upstream - "-" 0 761 1 1 "-" "curl/8.14.1" "4690fa8f-d985-43ed-b8db-7453ad9dc8fd" "echo.istio-vt-t56-r2" "10.255.126.15:8080" outbound|80||echo.istio-vt-t56-r2.svc.cluster.local 10.255.194.70:32774 10.250.222.223:80 10.255.194.70:32780 - default (if the above is empty, ZERO requests carry any non-default response flag anywhere in the window) === FINAL after-stats: /clusters cx_connect_fail / cx_total (post burst) === outbound|80||echo.istio-vt-t56-r2.svc.cluster.local::10.255.126.15:8080::cx_connect_fail::0 outbound|80||echo.istio-vt-t56-r2.svc.cluster.local::10.255.126.15:8080::cx_total::17 === FINAL: CMD7-equivalent — UF count in istio-proxy access log, full observation window === 0