Jaeger: No span received after restarting jaeger-agent-daemonset
This issue happens when your app is talking to the jaeger agent using the Kubernetes node’s IP and the agent restarted.
JAEGER_AGENT_HOST: fieldRef(v1:status.hostIP)
The issue is related to `conntrack`. Conntrack is the kernel module that tracks connections. In this case, it tracks the udp connections from the instrumented apps to the jaeger agent. If has the timeout value (default: 30 seconds). If the app keeps sending spans before timed out, the connection is tracked.
# To list the conntrack entries by a destination port
sudo conntrack -L --orig-port-dst 6831
udp 17 29 src=10.244.3.71 dst=10.0.16.45 sport=36073 dport=6831 [UNREPLIED] src=10.244.3.70 dst=10.244.3.1 sport=6831 dport=36073 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=2
udp 17 22 src=10.244.3.60 dst=10.0.16.45 sport=51469 dport=6831 [UNREPLIED] src=10.244.3.55 dst=10.244.3.1 sport=6831 dport=51469 mark=0 secctx=system_u:object_r:unlabeled_t:s0 use=1
The short term is simply deleting the entries in each Kubernetes node or restart the instrumented apps. Both are tedious.
# To delete the conntrack entries by a destination port
conntrack -D -p udp --orig-port-dst 6831
The better way is adding the init container to the jaeger agent
# Init-container which will run on pod startup
initContainers:
- name: sysctl
image: docker.io/busybox:1.33
imagePullPolicy: ifNotPresent
command:
- sh
- -c
- conntrack -D -p udp --orig-port-dst 6831
securityContext:
privileged: true
Relevant issue jaegertracing/jaeger-operator/issues/1427