Netdata on Kubernetes


  • Staff

    Install Netdata on a Kubernetes cluster

    Monitor a Kubernetes (k8s) cluster with Netdata

    Recently updated by our doc team, two comprehensive guides on setting up Netdata on k8s!



  • @zack

    This looks like it’ll be awesome.

    Not sure if this is a bug or an RFC but I just tried to deploy this to an ARM64 cluster and the netdata-child pods get stuck as they can’t seem to pull their images for the architecture (output below).
    My immediate thought was that you guys didn’t support ARM yet but the netdata-parent pod starts up fine…

    Events:
      Type     Reason       Age                  From               Message
      ----     ------       ----                 ----               -------
      Normal   Scheduled    <unknown>            default-scheduler  Successfully assigned default/netdata-child-7c4mc to pi-node4
      Warning  FailedMount  6m47s                kubelet, pi-node4  MountVolume.SetUp failed for volume "config" : failed to sync configmap cache: timed out waiting for the condition
      Normal   Pulling      6m46s                kubelet, pi-node4  Pulling image "netdata/wget"
      Normal   Pulled       6m42s                kubelet, pi-node4  Successfully pulled image "netdata/wget"
      Normal   Created      6m42s                kubelet, pi-node4  Created container init-nodeuid
      Normal   Started      6m42s                kubelet, pi-node4  Started container init-nodeuid
      Warning  Failed       44s                  kubelet, pi-node4  Failed to pull image "netdata/agent-sd:v0.1.0": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/netdata/agent-sd:v0.1.0": failed to unpack image on snapshotter overlayfs: no match for platform in manifest sha256:25bade1318e4238fdabd278f4590733a0843a0ae2de764e8a50d75d6a88f60e5: not found
      Normal   Pulling      43s (x2 over 6m41s)  kubelet, pi-node4  Pulling image "netdata/netdata:v1.24.0"
      Normal   Created      42s (x2 over 46s)    kubelet, pi-node4  Created container netdata
      Normal   Pulled       42s (x2 over 49s)    kubelet, pi-node4  Successfully pulled image "netdata/netdata:v1.24.0"
      Normal   Started      41s (x2 over 46s)    kubelet, pi-node4  Started container netdata
      Normal   BackOff      40s (x3 over 41s)    kubelet, pi-node4  Back-off pulling image "netdata/agent-sd:v0.1.0"
      Warning  Failed       40s (x3 over 41s)    kubelet, pi-node4  Error: ImagePullBackOff
      Warning  BackOff      31s (x2 over 40s)    kubelet, pi-node4  Back-off restarting failed container
      Normal   Pulling      31s (x2 over 46s)    kubelet, pi-node4  Pulling image "netdata/agent-sd:v0.1.0"
      Warning  Failed       30s (x2 over 44s)    kubelet, pi-node4  Error: ErrImagePull
    


  • Running on aarch64 ubuntu 16.04

    Tried docker first

    user@ubuntu1604-aarch64:~/netdata$ docker run -d --name=netdata \
    >   -p 19999:19999 \
    >   -v netdatalib:/var/lib/netdata \
    >   -v netdatacache:/var/cache/netdata \
    >   -v /etc/passwd:/host/etc/passwd:ro \
    >   -v /etc/group:/host/etc/group:ro \
    >   -v /proc:/host/proc:ro \
    >   -v /sys:/host/sys:ro \         ^C
    user@ubuntu1604-aarch64:~/netdata$ sudo docker run -d --name=netdata \
    >   -p 19999:19999 \
    >   -v netdatalib:/var/lib/netdata \
    >   -v netdatacache:/var/cache/netdata \
    >   -v /etc/passwd:/host/etc/passwd:ro \
    >   -v /etc/group:/host/etc/group:ro \
    >   -v /proc:/host/proc:ro \
    >   -v /sys:/host/sys:ro \
    >   -v /etc/os-release:/host/etc/os-release:ro \
    >   --restart unless-stopped \
    >   --cap-add SYS_PTRACE \
    >   --security-opt apparmor=unconfined \
    >   netdata/netdata
    Unable to find image 'netdata/netdata:latest' locally
    latest: Pulling from netdata/netdata
    4f861a20f507: Pull complete 
    7bb4d159526d: Pull complete 
    e2e87b7a7de9: Pull complete 
    2c8445a10990: Pull complete 
    adbf0b90c51f: Pull complete 
    f7f8b8493280: Pull complete 
    db8649e50b77: Pull complete 
    37e23bb8abd9: Pull complete 
    02279201ef13: Pull complete 
    fabbb19aede9: Pull complete 
    Digest: sha256:eb3b37414ecb87e7b64949826bc56e3f27d24fa0ef2e29e58de1dc4972a534e1
    Status: Downloaded newer image for netdata/netdata:latest
    be6767de06e7b973a8acb39648bf409916a8cf45f09aed4bb3887bd88e39d384
    

    Which netdata images are you pulling and can you try with :latest?



  • I’m not using docker, I’m using k3s with containerd backend.
    I’ve tried deploying the helm chart with the default values.yaml but modifying the image tags to “latest”:

    replicaCount: 1
    deploymentStrategy:
      type: Recreate
    
    image:
      repository: netdata/netdata
      tag: latest
      pullPolicy: Always
    
    sd:
      repository: netdata/agent-sd
      tag: latest
      pullPolicy: Always
      child:
        enabled: true
        configmap:
          name: netdata-child-sd-config-map
          key: config.yml
          # if 'from' is {} the ConfigMap is not generated
          from:
            file: sdconfig/child.yml
            value: {}
        resources: {}
        # limits:
        #  cpu: 50m
        #  memory: 60Mi
        # requests:
        #  cpu: 50m
        #  memory: 60Mi
    

    These are the commands I ran:

    git clone https://github.com/netdata/helmchart.git netdata-helmchart
    helm install netdata ./netdata-helmchart -f values.yaml

    NAME: netdata
    LAST DEPLOYED: Wed Sep 16 22:15:16 2020
    NAMESPACE: default
    STATUS: deployed
    REVISION: 1
    TEST SUITE: None
    NOTES:
    1. netdata will be available on http://netdata.k8s.local/, on the exposed port of your ingress controller
    
    In a production environment, you
     You can get that port via `kubectl get services`. e.g. in the following example, the http exposed port is 31737, the https one is 30069.
     The hostname netdata.k8s.local will need to be added to /etc/hosts, so that it resolves to the exposed IP. That IP depends on how your cluster is set up:
            - When no load balancer is available (e.g. with minikube), you get the IP shown on `kubectl cluster-info`
            - In a production environment, the command `kubectl get services` will show the IP under the EXTERNAL-IP column
    
    The port can be retrieved in both cases from `kubectl get services`
    
    NAME                                         TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)                      AGE
    exiled-tapir-nginx-ingress-controller        LoadBalancer   10.98.132.169    <pending>     80:31737/TCP,443:30069/TCP   11h
    
    
    
    
    
    luis@pi-node1:~/k8s/netdata$ kubectl get po
    NAME                              READY   STATUS             RESTARTS   AGE
    netdata-parent-85b8ddd7f8-6m95r   1/1     Running            0          8m2s
    netdata-child-sz6b4               0/2     CrashLoopBackOff   6          8m2s
    
    
    luis@pi-node1:~/k8s/netdata$ kubectl logs netdata-child-sz6b4
    error: a container name must be specified for pod netdata-child-sz6b4, choose one of: [netdata sd] or one of the init containers: [init-nodeuid]
    
    
    
    
    luis@pi-node1:~/k8s/netdata$ uname -a
    Linux pi-node1 5.4.0-1018-raspi #20-Ubuntu SMP Sun Sep 6 05:11:16 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux
    
    

  • Staff

    Warning Failed 44s kubelet, pi-node4 Failed to pull image “netdata/agent-sd:v0.1.0”: rpc error: code = NotFound desc = failed to pull and unpack image “docker.io/netdata/agent-sd:v0.1.0”: failed to unpack image on snapshotter overlayfs: no match for platform in manifest sha256:25bade1318e4238fdabd278f4590733a0843a0ae2de764e8a50d75d6a88f60e5: not found

    failed to pull and unpack image “docker.io/netdata/agent-sd:v0.1.0”: failed to unpack image on snapshotter overlayfs: no match for platform

    Indeed, our netdata/agent-sd image has no linux/arm64 platform support

    I added it in https://github.com/netdata/agent-service-discovery/commit/26c687f92ee3f15bed39ba463cbf0e58b2654734

    linux/arm64 is in latest and v0.2.1


  • Staff

    Thanks @ilyam8 for this. Luis, if you manage to run this successfully, come back and tell us about it!



  • I’m still getting the same error in three scenarios:

    1. Fresh pull from git and then helm install
    2. Fresh pull from git and then helm install with values.yaml with tag set to “latest”
    3. Fresh pull from git and then helm install with values.yaml with tag set to “v0.2.1” for the sd image

    Should I be doing something different? 🙂



  • Hi Luis!

    Could you post events from the failing pod(kubectl describe) when you using v0.2.1 image tag?


  • Staff

    both latest and v0.2.1 have linux/arm64 platform

    https://hub.docker.com/r/netdata/agent-sd/tags

    agent-sd is optional, can be disabled in values.yaml
    https://github.com/netdata/helmchart/blob/7c0d22bd415daa52ce6d82b6d0c8e12270bac186/values.yaml#L10-L15



  • @Rybue Thanks for the reply!
    OK, so the main container is set to latest and the sd one is set to v0.2.1
    Here’s the output:

    luis@pi-node1:~/k8s/netdata$ kubectl get po
    NAME                             READY   STATUS    RESTARTS   AGE
    netdata-parent-cfb988d65-rkz5m   0/1     Running   0          14s
    netdata-child-zq2vl              1/2     Error     1          15s
    luis@pi-node1:~/k8s/netdata$
    luis@pi-node1:~/k8s/netdata$ kubectl describe po netdata-child-zq2vl
    Name:         netdata-child-zq2vl
    Namespace:    default
    Priority:     0
    Node:         pi-node1/192.168.178.81
    Start Time:   Thu, 17 Sep 2020 21:47:31 +0100
    Labels:       app=netdata
                  controller-revision-hash=65778dd95d
                  pod-template-generation=1
                  release=netdata
                  role=child
    Annotations:  checksum/config: dbf27785c04d58fa098895f1e45be1b72b4ea76b283ec2d0d373412977e44329
                  container.apparmor.security.beta.kubernetes.io/netdata: unconfined
    Status:       Running
    IP:           192.168.178.81
    IPs:
      IP:           192.168.178.81
    Controlled By:  DaemonSet/netdata-child
    Init Containers:
      init-nodeuid:
        Container ID:  containerd://6f5ff79976b7eb57caeb397cd4780746fc6f6d7074d4b69cc3f7c805197a8a66
        Image:         netdata/wget
        Image ID:      docker.io/netdata/wget@sha256:44e7a2be59451de7fda0bef7f35caeeb34a5e9c96949b17069ec7b62d7545af2
        Port:          <none>
        Host Port:     <none>
        Command:
          /bin/sh
        Args:
          -c
           TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token); URL="https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT}/api/v1/nodes/${MY_NODE_NAME}"; HEADER="Authorization: Bearer ${TOKEN}";
          DATA=$(wget -q -T 5 --no-check-certificate --header "${HEADER}" -O - "${URL}"); [ -z "${DATA}" ] && exit 1;
          UID=$(echo "${DATA}" | grep -m 1 uid | grep -o ":.*" | tr -d ": \","); [ -z "${UID}" ] && exit 1;
          echo -n "${UID}" > /nodeuid/netdata.public.unique.id;
        State:          Terminated
          Reason:       Completed
          Exit Code:    0
          Started:      Thu, 17 Sep 2020 21:47:35 +0100
          Finished:     Thu, 17 Sep 2020 21:47:35 +0100
        Ready:          True
        Restart Count:  0
        Environment:
          MY_NODE_NAME:   (v1:spec.nodeName)
        Mounts:
          /nodeuid from nodeuid (rw)
          /var/run/secrets/kubernetes.io/serviceaccount from netdata-token-mbdkj (ro)
    Containers:
      netdata:
        Container ID:   containerd://f75510bcd3b0c280208e144d1479d0a23e0128d10c0e16f18afdf8dd35b79504
        Image:          netdata/netdata:latest
        Image ID:       docker.io/netdata/netdata@sha256:06ca7394e515561613324e6700b49deb1bb92de787f9f78bc98b76bc5d2a7462
        Port:           19999/TCP
        Host Port:      19999/TCP
        State:          Terminated
          Reason:       Error
          Exit Code:    1
          Started:      Thu, 17 Sep 2020 21:47:43 +0100
          Finished:     Thu, 17 Sep 2020 21:47:44 +0100
        Last State:     Terminated
          Reason:       Error
          Exit Code:    1
          Started:      Thu, 17 Sep 2020 21:47:39 +0100
          Finished:     Thu, 17 Sep 2020 21:47:40 +0100
        Ready:          False
        Restart Count:  1
        Liveness:       http-get http://:http/api/v1/info delay=0s timeout=1s period=30s #success=1 #failure=3
        Readiness:      http-get http://:http/api/v1/info delay=0s timeout=1s period=30s #success=1 #failure=3
        Environment:
          MY_POD_NAME:                     netdata-child-zq2vl (v1:metadata.name)
          MY_NODE_NAME:                     (v1:spec.nodeName)
          MY_POD_NAMESPACE:                default (v1:metadata.namespace)
          NETDATA_PLUGINS_GOD_WATCH_PATH:  /etc/netdata/go.d/sd/go.d.yml
        Mounts:
          /etc/netdata/go.d.conf from config (rw,path="go.d")
          /etc/netdata/go.d/k8s_kubelet.conf from config (rw,path="kubelet")
          /etc/netdata/go.d/k8s_kubeproxy.conf from config (rw,path="kubeproxy")
          /etc/netdata/go.d/sd/ from sd-shared (rw)
          /etc/netdata/netdata.conf from config (rw,path="netdata")
          /etc/netdata/stream.conf from config (rw,path="stream")
          /host/proc from proc (ro)
          /host/sys from sys (rw)
          /var/lib/netdata/registry/ from nodeuid (rw)
          /var/run/docker.sock from run (rw)
          /var/run/secrets/kubernetes.io/serviceaccount from netdata-token-mbdkj (ro)
      sd:
        Container ID:   containerd://5c652252424cd16b7d37b47b5559b3a00d7ca3c49e71b337ba20ed2a08b26426
        Image:          netdata/agent-sd:v0.2.1
        Image ID:       docker.io/netdata/agent-sd@sha256:31cdb9c2c6b4e87deb075e1c620f8cb03c4ae9627f0c21cfebdbb998f5a325fa
        Port:           <none>
        Host Port:      <none>
        State:          Running
          Started:      Thu, 17 Sep 2020 21:47:40 +0100
        Ready:          True
        Restart Count:  0
        Environment:
          NETDATA_SD_CONFIG_MAP:  netdata-child-sd-config-map:config.yml
          MY_POD_NAMESPACE:       default (v1:metadata.namespace)
          MY_NODE_NAME:            (v1:spec.nodeName)
        Mounts:
          /export/ from sd-shared (rw)
          /var/run/secrets/kubernetes.io/serviceaccount from netdata-token-mbdkj (ro)
    Conditions:
      Type              Status
      Initialized       True
      Ready             False
      ContainersReady   False
      PodScheduled      True
    Volumes:
      proc:
        Type:          HostPath (bare host directory volume)
        Path:          /proc
        HostPathType:
      run:
        Type:          HostPath (bare host directory volume)
        Path:          /var/run/docker.sock
        HostPathType:
      sys:
        Type:          HostPath (bare host directory volume)
        Path:          /sys
        HostPathType:
      config:
        Type:      ConfigMap (a volume populated by a ConfigMap)
        Name:      netdata-conf-child
        Optional:  false
      nodeuid:
        Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
        Medium:
        SizeLimit:  <unset>
      sd-shared:
        Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
        Medium:
        SizeLimit:  <unset>
      netdata-token-mbdkj:
        Type:        Secret (a volume populated by a Secret)
        SecretName:  netdata-token-mbdkj
        Optional:    false
    QoS Class:       BestEffort
    Node-Selectors:  <none>
    Tolerations:     :NoSchedule
                     node.kubernetes.io/disk-pressure:NoSchedule
                     node.kubernetes.io/memory-pressure:NoSchedule
                     node.kubernetes.io/network-unavailable:NoSchedule
                     node.kubernetes.io/not-ready:NoExecute
                     node.kubernetes.io/pid-pressure:NoSchedule
                     node.kubernetes.io/unreachable:NoExecute
                     node.kubernetes.io/unschedulable:NoSchedule
    Events:
      Type     Reason     Age                From               Message
      ----     ------     ----               ----               -------
      Normal   Scheduled  <unknown>          default-scheduler  Successfully assigned default/netdata-child-zq2vl to pi-node1
      Normal   Pulling    25s                kubelet, pi-node1  Pulling image "netdata/wget"
      Normal   Pulled     24s                kubelet, pi-node1  Successfully pulled image "netdata/wget"
      Normal   Created    24s                kubelet, pi-node1  Created container init-nodeuid
      Normal   Started    23s                kubelet, pi-node1  Started container init-nodeuid
      Normal   Pulling    19s                kubelet, pi-node1  Pulling image "netdata/agent-sd:v0.2.1"
      Normal   Started    18s                kubelet, pi-node1  Started container sd
      Normal   Pulled     18s                kubelet, pi-node1  Successfully pulled image "netdata/agent-sd:v0.2.1"
      Normal   Created    18s                kubelet, pi-node1  Created container sd
      Normal   Pulling    17s (x2 over 22s)  kubelet, pi-node1  Pulling image "netdata/netdata:latest"
      Normal   Pulled     16s (x2 over 21s)  kubelet, pi-node1  Successfully pulled image "netdata/netdata:latest"
      Normal   Created    16s (x2 over 20s)  kubelet, pi-node1  Created container netdata
      Normal   Started    15s (x2 over 19s)  kubelet, pi-node1  Started container netdata
      Warning  BackOff    13s                kubelet, pi-node1  Back-off restarting failed container
    

    @ilyam8
    What is the sd container used for? If I disable it what won’t work?
    Would be good to have a short description on the docker page? 🙂



  • Ok, it looks like problem now not with pulling image(as it get created sucessfully), but something goes wrong when container is started.
    Could you post logs from both containers in netdata-child-zq2vl pod?

    kubectl logs netdata-child-zq2vl -c sd
    kubectl logs netdata-child-zq2vl -c netdata
    


  • Also, the issue may be that netdata child tries to connect to parent, but parent not actually serving any connections, as we can see from here netdata-parent-cfb988d65-rkz5m 0/1 Running
    Looks like Readiness probe is failed there.

    You could also post events from the parent pod 🙂


  • Staff

    It is https://github.com/netdata/agent-service-discovery#service-discovery

    its purpose is to identify applications running inside the containers and create configuration files that is used by netdata plugins.

    I see now it is netdata is the container that is failing to start 😃



  • luis@pi-node1:~/k8s/netdata$ kubectl logs netdata-child-zq2vl -c sd
    {"level":"info","component":"pipeline manager","time":"2020-09-17 20:47:40","message":"instance is started"}
    {"level":"info","component":"k8s config provider","time":"2020-09-17 20:47:40","message":"instance is started"}
    {"level":"info","component":"export manager","time":"2020-09-17 20:47:40","message":"registered: '[file exporter (/export/go.d.yml)]'"}
    {"level":"info","component":"discovery manager","time":"2020-09-17 20:47:40","message":"registered: [k8s discovery manager]"}
    {"level":"info","component":"pipeline manager","time":"2020-09-17 20:47:40","message":"received a new config, starting a new pipeline ('k8s/cmap/default/netdata-child-sd-config-map:config.yml')"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:47:40","message":"instance is started"}
    {"level":"info","component":"export manager","time":"2020-09-17 20:47:40","message":"instance is started"}
    {"level":"info","component":"discovery manager","time":"2020-09-17 20:47:40","message":"instance is started"}
    {"level":"info","component":"file export","time":"2020-09-17 20:47:40","message":"instance is started"}
    {"level":"info","component":"k8s discovery manager","time":"2020-09-17 20:47:40","message":"registered: [k8s pod discovery]"}
    {"level":"info","component":"k8s pod discovery","time":"2020-09-17 20:47:40","message":"instance is started"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"received '8' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/traefik-758cd5fc85-b9bdt' with 5 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/local-path-provisioner-6d59f47c7-96h7q' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6' with 3 target(s)"}
    {"level":"info","component":"build manager","time":"2020-09-17 20:47:45","message":"built 1 config(s) for target 'kube-system_coredns-7944c66d8d-4v9q6_coredns_tcp_9153'"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6': new/stale config(s) 1/0"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/helm-install-traefik-fsk4c' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/svclb-traefik-tkfnn' with 2 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:47:45","message":"processing group 'k8s/pod/kube-system/metrics-server-7566d596c8-82vtg' with 1 target(s)"}
    {"level":"info","component":"file export","time":"2020-09-17 20:47:46","message":"wrote 1 config(s) to '/export/go.d.yml'"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:48:00","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:48:00","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:48:05","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:48:05","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:48:15","message":"received '2' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:48:15","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:48:15","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:48:30","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:48:30","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:48:45","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:48:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:49:20","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:49:20","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:49:25","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:49:25","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:49:40","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:49:40","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:50:55","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:50:55","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:51:10","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:51:10","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:53:35","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:53:35","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:53:40","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:53:40","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:53:45","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:53:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"received '8' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/metrics-server-7566d596c8-82vtg' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/traefik-758cd5fc85-b9bdt' with 5 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/local-path-provisioner-6d59f47c7-96h7q' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6' with 3 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/helm-install-traefik-fsk4c' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/kube-system/svclb-traefik-tkfnn' with 2 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:57:45","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:58:45","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:58:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:59:00","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 20:59:00","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:03:50","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:03:50","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:04:05","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:04:05","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"received '8' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/metrics-server-7566d596c8-82vtg' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/traefik-758cd5fc85-b9bdt' with 5 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/local-path-provisioner-6d59f47c7-96h7q' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6' with 3 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/helm-install-traefik-fsk4c' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/kube-system/svclb-traefik-tkfnn' with 2 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:07:45","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:09:00","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:09:00","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:09:15","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:09:15","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:14:10","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:14:10","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:14:15","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:14:15","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:14:30","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:14:30","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"received '8' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/metrics-server-7566d596c8-82vtg' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/traefik-758cd5fc85-b9bdt' with 5 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/local-path-provisioner-6d59f47c7-96h7q' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6' with 3 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/helm-install-traefik-fsk4c' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/kube-system/svclb-traefik-tkfnn' with 2 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:17:45","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:19:20","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:19:20","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:19:30","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:19:30","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:24:30","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:24:30","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:24:35","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:24:35","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:24:45","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:24:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"received '8' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/svclb-traefik-tkfnn' with 2 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/default/netdata-parent-cfb988d65-rkz5m' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/metrics-server-7566d596c8-82vtg' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/traefik-758cd5fc85-b9bdt' with 5 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/local-path-provisioner-6d59f47c7-96h7q' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/coredns-7944c66d8d-4v9q6' with 3 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:27:45","message":"processing group 'k8s/pod/kube-system/helm-install-traefik-fsk4c' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:29:45","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:29:45","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:29:55","message":"received '1' group(s)"}
    {"level":"info","component":"pipeline","time":"2020-09-17 21:29:55","message":"processing group 'k8s/pod/default/netdata-child-zq2vl' with 1 target(s)"}
    
    kubectl logs netdata-child-zq2vl -c netdata
    Netdata entrypoint script starting
    2020-09-17 21:29:42: netdata INFO  : MAIN : CONFIG: cannot load cloud config '/var/lib/netdata/cloud.d/cloud.conf'. Running with internal defaults.
    2020-09-17 21:29:42: netdata INFO  : MAIN : Found 0 legacy dbengines, setting multidb diskspace to 256MB
    2020-09-17 21:29:42: netdata INFO  : MAIN : Created file '/var/lib/netdata/dbengine_multihost_size' to store the computed value
    2020-09-17 21:29:42: netdata INFO  : MAIN : Using host prefix directory '/host'
    2020-09-17 21:29:42: netdata INFO  : MAIN : SIGNAL: Not enabling reaper
    2020-09-17 21:29:42: netdata ERROR : MAIN : LISTENER: Invalid listen port 0 given. Defaulting to 19999. (errno 22, Invalid argument)
    2020-09-17 21:29:42: netdata ERROR : MAIN : LISTENER: IPv4 bind() on ip '0.0.0.0' port 19999, socktype 1 failed. (errno 98, Address in use)
    2020-09-17 21:29:42: netdata ERROR : MAIN : LISTENER: Cannot bind to ip '0.0.0.0', port 19999
    2020-09-17 21:29:42: netdata ERROR : MAIN : LISTENER: IPv6 bind() on ip '::' port 19999, socktype 1 failed. (errno 98, Address in use)
    2020-09-17 21:29:42: netdata ERROR : MAIN : LISTENER: Cannot bind to ip '::', port 19999
    2020-09-17 21:29:42: netdata FATAL : MAIN : LISTENER: Cannot listen on any API socket. Exiting... # : Invalid argument
    
    2020-09-17 21:29:42: netdata INFO  : MAIN : EXIT: netdata prepares to exit with code 1...
    2020-09-17 21:29:42: netdata INFO  : MAIN : EXIT: cleaning up the database...
    2020-09-17 21:29:42: netdata INFO  : MAIN : Cleaning up database [0 hosts(s)]...
    2020-09-17 21:29:42: netdata INFO  : MAIN : EXIT: all done - netdata is now exiting - bye bye...
    

    Please note that I am running netdata on the k8s/k3s host node… 😁



  • Yeah, that seems to be an issue. I’m not sure how your Kubernetes configured, but it looks like netdata pod conflicting with other processes on the same port.
    You can try to reconfigure your host netdata to run on a different port, to see if it solve the issue 🙂


  • Staff

    Luis keep us updated! @rybue thanks again for chiming in. You are helping a lot in this community 🙂



  • OK, that fixed it. I changed the listen port from 19999 to 19998 on the physical host in /etc/netdata/netdata.conf

    Looks good so far!! 😀

    So, I’m getting my head around how this works:

    I’m guessing from my playtime so far that this makes the agent on the host itself redundant since each child pod looks to be showing all the same information (plus more)…Is that the idea?
    If so, what happens if I hook this up to send stats up to my tenant in the Netdata cloud and then re-deploy the helm chart a few times; am I going to wind up with a consistent node-identity; or will I end up with either lots of orphaned nodes with the same name or a bunch of nodes with the same name but incremented numbers attached to them etc?
    Happy to try it ofc but just curious as I’ve got my workspaces up there setup nicely now

    One curious thing though: I spun up another node and added it to the cluster (child service came up fine with the modified host port) but I noticed the “k8s kubelet” and “k8s kubeproxy” menu’s on the right but those didn’t appear on the original node that was deployed to. Seems a bit odd given that the first node was and still is the only master…

    Is there a way for me to specify certain settings in the values.yaml for the Web UI? For example I like having my charts always refresh rather than the default of “On Focus”. If I set it in the running UI then as soon as I switch to a different child node and back then the setting is reverted. Ideally, could we get the config stored in a Persistent Volume or something?

    Also, do you guys have changes planned for representing/navigating the sections on each child node dedicated to specific pods? I ask because I have only circa 8 containers per node and the UI is rather cluttered: I can imagine a whole lot of scrolling and stuttering of the browser on a production system. I’ve felt like that right-side pane needed a search box and maybe this is the requirement for one?


  • Staff

    Hey @Luis-Johnstone ,

    To force the refresh of the Dashboard, you only need to append the update_always=true argument to the URL:
    http://192.168.1.150:19999/#menu_system_submenu_cpu;theme=slate;help=true;update_always=true

    We intend to offer proper support for kubernetes, including better visualization, optimized for the unique experience kubernetes offers (e.g ephemeral nodes). But, this is not on the committed roadmap, thus we can’t say in good conscience when it’s going to be shipped, or give more details about it.

    if I understand what you say correctly, the streaming functionality is intended so that the child nodes replicate their database to the master, so that the master not only can offer the same metrics but also can apply alarms on them. Depending on your use-case, this setup might make sense to you, or you might prefer to have the data live on each child node and access them through netdata cloud, leveraging the extra functionality, such as custom dashboards or metric correlations.

    I hope that I helped!

    Keep the feedback coming, we can’t get enough of it 💪


Log in to reply