Part 3: Hands-on - SpringBoot and Elasticsearch Integration with WebFlux Reactive Programming

Part 3: Using Filebeat to Parse Your SpringBoot Logs to Elasticsearch

1. Preface

This is the final part - using Filebeat to parse your SpringBoot logs to Elasticsearch and analyze them in Kibana.

You can find the previous two sections at the links above.

You might ask, why not use Logstash?

Indeed, both Filebeat and Logstash are tools in the Elastic Stack for collecting, parsing, and sending log data. But for individual developers, Filebeat is much more lightweight than Logstash, and configuration is simpler - just one yml config to specify log files, then pull up a Docker container and you're done. As for log data analysis, I don't have the energy to tinker with that yet. The requirement is just to sync logs from different servers to one ES, making it convenient to view logs from different servers, applications, and APIs in Kibana at once, rather than SSH connecting to each one, cd to the specified directory, and waiting for the less command to scroll for a while to reach the latest logs.

2. Filebeat Introduction

First, here's the official Filebeat documentation - the source of most information: Filebeat Reference

Filebeat is a lightweight, efficient tool. Filebeat's main goal is to collect log data from various sources (like log files, system logs, application output, etc.) and transmit it to target storage or processing systems like Elasticsearch or Logstash.

Yes, Filebeat can also work with Logstash, so adding Logstash later for upgrades is fine.

3. Using Filebeat

Before using it, the first thing we need to do is find the log location. If you've been following the previous two blog posts, in the project demo-springboot-elasticsearch, the log generation path is configured in logback.xml:

By default, it generates in the current project directory, so you need to note the absolute path of the log directory - we'll use it later.

3.1 Prepare http_ca.crt and filebeat.yml Configuration File

Now that we know the log path, we can start preparing to install Filebeat. We'll still use Docker deployment.

http_ca.crt - do you remember? In newer versions of ES, this is needed to establish SSL connections, and it's needed here too. You can copy another one from the ES container - the following command copies http_ca.crt directly to the current directory:

docker cp elasticsearch:/usr/share/elasticsearch/config/certs/http_ca.crt .

Next, download a filebeat.yml template file:

curl -L -O https://raw.githubusercontent.com/elastic/beats/8.9/deploy/docker/filebeat.docker.yml

For those with slow downloads, you can use the content below directly - it's the same:

filebeat.config:
  modules:
    path: ${path.config}/modules.d/*.yml
    reload.enabled: false

filebeat.autodiscover:
  providers:
    - type: docker
      hints.enabled: true

processors:
- add_cloud_metadata: ~

output.elasticsearch:
  hosts: '${ELASTICSEARCH_HOSTS:elasticsearch:9200}'
  username: '${ELASTICSEARCH_USERNAME:}'
  password: '${ELASTICSEARCH_PASSWORD:}'

Then modify this template file according to your environment:

# Don't change this - filebeat has built-in modules to help parse logs like nginx
# Here we don't parse SpringBoot logs through this
filebeat.config:
  modules:
    path: ${path.config}/modules.d/*.yml
    reload.enabled: false

# Log file input source to filebeat
filebeat.inputs:
- type: filestream
  enabled: true
  id: demo-springboot-elasticsearch
  paths:
    - /share/Develop/dockerData/runnable-run/demo-springboot-elasticsearch/logs/*.log
  pipeline: springboot_pipeline
  tags: ["demo-springboot-elasticsearch"]

processors:
  - add_cloud_metadata: ~

# Filebeat output configuration - your ES configuration
output.elasticsearch:
  hosts: ['https://ip:9200']
  username: 'elastic'
  password: 'password'
  ssl:
    enabled: true
# http_ca.crt
    certificate_authorities: "/usr/share/filebeat/http_ca.crt"
  setup.kibana:
    host: "ip:5601"
    username: "elastic"
    password: "password"
  indices:
    - index: "demo-springboot-elasticsearch"
      when.contains:
        tags: "demo-springboot-elasticsearch"

Configuration explanations:

type: filestream - There are many types, but here we need filebeat to read .log files, so filestream is sufficient
id: demo-springboot-elasticsearch: Specify a unique ID to identify this input configuration
paths: Specify the log file paths to monitor
- /share/Develop/dockerData/runnable-run/demo-springboot-elasticsearch/logs/*.log: Uses wildcard * to match all .log files in the specified directory. Filebeat will monitor log data in these files.
pipeline: springboot_pipeline - This tells ES which pipeline to use to parse logs after they're pushed to ES. We'll cover this shortly.
tags: ["demo-springboot-elasticsearch"] - Adds a tag to all log events collected from this input source. This tag helps identify specific log data sources in subsequent processing.
output.elasticsearch: This section specifies the configuration for Filebeat to send log data to Elasticsearch.
hosts: ['https://ip:9200']: Specify the Elasticsearch cluster host address and port. Using https protocol for secure connection.
username: 'elastic': Username for connecting to Elasticsearch.
password: 'password': Password for connecting to Elasticsearch.
ssl.enabled: true: Enable SSL/TLS secure connection to ensure data transmission security.
certificate_authorities: "/usr/share/filebeat/http_ca.crt": Specify the CA certificate file path for verifying the connection with Elasticsearch. We'll mount this certificate via Docker, so the path can stay the same.
setup.kibana: This section configures the connection between Filebeat and Kibana.
host: "ip:5601": Specify Kibana's host address and port.
username: "elastic": Username for connecting to Kibana.
password: "password": Password for connecting to Kibana.
indices: This section specifies index configuration when writing log data to Elasticsearch.
index: "demo-springboot-elasticsearch": Specify index name as "demo-springboot-elasticsearch". Filebeat will write collected log data to this index.
when.contains: A conditional check - when log events contain the specific tag "demo-springboot-elasticsearch", write them to the specified index above.

Someone has already compiled many filebeat templates - see this blog post: filebeat custom index name, filebeat index template

After configuration, don't rush - we're not at the container startup step yet. First go to Kibana to configure the springboot_pipeline mentioned in the filebeat.yml file. This springboot_pipeline tells ES how to parse your logs - you can't just throw logs up there and ignore them.

3.2 Create Pipeline to Parse SpringBoot Logs

Log into Kibana. We can first write the syntax for parsing SpringBoot logs in Dev Tools.

Of course, you can skip this part and use what I've already written. This section tells you how to troubleshoot if logs are pushed to ES but you can't find the corresponding index, or the index has no data - you need to check if your pipeline parsing is correct.

Fill in some SpringBoot log samples in Sample Data. I know some people use custom log formats:

Then write matching syntax in the Grok Pattern below. ES has a page explaining how to use Grok Pattern: https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html

Here's what I use:

%{TIMESTAMP_ISO8601:timestamp}  %{LOGLEVEL:loglevel} %{NUMBER:thread} --- \[%{DATA:threadName}\] %{DATA:logger} : %{GREEDYDATA:message}

%{TIMESTAMP_ISO8601:timestamp} - % represents a parsing expression, TIMESTAMP_ISO8601 is the matching rule, timestamp is the field name that will be generated.

Then click Simulate below. If matched correctly, the parsed fields will be displayed. If there's a problem, an error will appear in the bottom right corner.

Then go back to the Console page, replace patterns with your Grok Pattern that matches logs - you may need to add \ escape characters:

Use the following code to write springboot_pipeline to ES, then click the run button in the top right corner:

PUT _ingest/pipeline/springboot_pipeline
{
  "description": "Spring Boot Log Pipeline",
  "processors": [
    {
      "grok": {
        "field": "message",
        "patterns": [
          "%{TIMESTAMP_ISO8601:timestamp}  %{LOGLEVEL:loglevel} %{NUMBER:thread} --- \\[%{DATA:threadName}\\] %{DATA:logger} : %{GREEDYDATA:message}"
        ]
      }
    },
    {
      "date": {
        "field": "timestamp",
        "target_field": "@timestamp",
        "formats": ["yyyy-MM-dd'T'HH:mm:ss.SSSXXX"]
      }
    },
    {
      "remove": {
        "field": "timestamp"
      }
    }
  ]
}

Note the formats in the command above - if the time format doesn't match, ES won't be able to parse it.

Now your ES has this pipeline. The last step is to run the filebeat container.

2023-11-04 17:57:04 Update

After creating this pipeline, if you want to verify it directly, you can execute the following command in that same window. If there are errors, ES will throw them directly:

POST _ingest/pipeline/ping_server/_simulate
{
  "docs": [
    {
      "_source": {
        "title": "Introducing cloud computing",
        "tags": "openstack,k8s",
        "message": "2023-11-04 16:04:14 - 64 bytes from 192.168.193.22: icmp_seq=2 ttl=64 time=49.839 ms "
      }
    }
  ]
}

When executed normally:

3.3 Start Filebeat with Docker

Go back to your server terminal, enter the directory containing filebeat.yml and http_ca.crt, create a data directory for filebeat data. Otherwise, deleting and recreating the container will cause all logs to be re-pushed to ES - if you have a year's worth of data being re-pushed, you'll want to cry...

mkdir data

I put them in the same folder. Then use the following command to start. You need to change /share/Develop/dockerData/runnable-run/ to your log path:

docker run  \
  --name=filebeat \
  --user=root \
  --volume="$(pwd)/filebeat.yml:/usr/share/filebeat/filebeat.yml:ro" \
  --volume="$(pwd)/http_ca.crt:/usr/share/filebeat/http_ca.crt:ro" \
  --volume="$(pwd)/data/:/usr/share/filebeat/data/" \
  --volume="/share/Develop/dockerData/runnable-run/:/share/Develop/dockerData/runnable-run/:ro" \
  --volume="/var/lib/docker/containers:/var/lib/docker/containers:ro" \
  --volume="/var/run/docker.sock:/var/run/docker.sock:ro" \
  docker.elastic.co/beats/filebeat:8.8.0 filebeat -e --strict.perms=false

--user=root: Run container as root user to ensure Filebeat has sufficient permissions to access required resources and files.
--volume="$(pwd)/filebeat.yml:/usr/share/filebeat/filebeat.yml:ro": Mount the filebeat.yml config file from current directory to /usr/share/filebeat/filebeat.yml in container, read-only (ro).
--volume="$(pwd)/data/:/usr/share/filebeat/data/": Mount filebeat's data directory.
--volume="$(pwd)/http_ca.crt:/usr/share/filebeat/http_ca.crt:ro": Mount the http_ca.crt certificate file from current directory to /usr/share/filebeat/http_ca.crt in container, read-only (ro).
--volume="/share/Develop/dockerData/runnable-run/:/share/Develop/dockerData/runnable-run/:ro": Mount host's /share/Develop/dockerData/runnable-run/ directory to same path in container, read-only (ro).
--volume="/var/lib/docker/containers:/var/lib/docker/containers:ro": Mount host's /var/lib/docker/containers directory to same path in container, read-only (ro).
--volume="/var/run/docker.sock:/var/run/docker.sock:ro": Mount host's Docker daemon Unix socket to /var/run/docker.sock in container, read-only (ro). This allows Filebeat to communicate with Docker daemon to get Docker container info and logs.
docker.elastic.co/beats/filebeat:8.8.0: Specify the Filebeat container image name and version.
filebeat -e --strict.perms=false: Command to run when starting Filebeat container:
- filebeat: Execute Filebeat program.
- -e: Run Filebeat in debug mode, output logs to console.
- --strict.perms=false: Disable strict permission checking on config files for compatibility with certain container environments, ensuring Filebeat can read config and other files.

I didn't set -d so you can see logs being read by filebeat and pushed to ES on first startup.

If everything is normal, you'll see SpringBoot log content in the output, and in Kibana you'll see the index correctly created with corresponding data:

4. Troubleshooting

If filebeat starts but doesn't push logs to ES, check these:

Is the log directory correctly mounted to filebeat?
filebeat.yaml file configuration
Is springboot_pipeline parsing correctly?

Also use Dev Tools to directly test if your grok pattern is correct.

5. Extended Section

After completing the above, you can essentially throw any logs at ES for parsing - as long as they have formatted output, just write your own Grok Pattern.

So other containers on the server that can output logs, along with API-generated logs I wrote before, all go up there. Then the fun begins - in ES's dashboard, create statistics for corresponding properties and you have a simple monitoring dashboard. For example, putting nginx logs up there lets you directly see blog IP request statistics.

This part requires you to explore how to play with it yourself.

For example, after putting nginx logs up there, I discovered that in the past month, the most visits to my blog came from the Changsha area, and there are quite a few Android devices too.

References

Using Filebeat to Collect Docker Container Logs

Metricbeat: Certificate signed by unknown authority

input logs an error when an existing input is reloaded with the same ID

Parsing logback log files with filebeat and sending them to Elasticsearch

filebeat using modules template to collect nginx logs (10)

filebeat-installation-configuration.html

Part 3: Using Filebeat to Parse Your SpringBoot Logs to Elasticsearch​

1. Preface​

2. Filebeat Introduction​

3. Using Filebeat​

3.1 Prepare http_ca.crt and filebeat.yml Configuration File​

3.2 Create Pipeline to Parse SpringBoot Logs​

2023-11-04 17:57:04 Update​

3.3 Start Filebeat with Docker​

4. Troubleshooting​

5. Extended Section​

References​