For your reference, below is a list of the articles in this series.
- Introduction to Logstash
- Overview of Logstash plugins
- Shipping Events to Logstash (Part 1) (this article)
- Shipping Events to Logstash (Part 2)
For this blog post, we’ll be sourcing our data from Loghub . Loghub is a valuable resource that curates a collection of system logs, freely available for AI-driven log analytics research.
In this particular project, we’ll focus on downloading the Linux logs from the Loghub repository.
These logs, collected from Linux systems, provide a rich and comprehensive dataset that reflects the activities, errors, and events that occur within these systems.
Ingest Linux logs into Elasticsearch.
We’ll kick off our journey by configuring Logstash and loading our log data into the Elasticsearch cluster, specifically under the linux
index.
Start the Elasticsearch cluster
1 | cd elasticsearch-7.15.0 |
Create a file logstash-7.15.0/pipelines/es_data.conf
1 | input { |
Input Configuration:
- The
input
section defines the source of data that Logstash should read. In this case, it’s a file input. path
: Specifies the path to the input file, which is a CSV file containing log data.start_position
: Sets the position from which Logstash should start reading the file. “beginning” indicates it should start from the beginning of the file.sincedb_write_interval
: This parameter defines how often Logstash should write to the sincedb, a file that keeps track of the current position in the file. In this case, it’s set to write every 60 seconds.
- The
Filter Configuration:
- The
filter
section is where data transformation and parsing take place. - The
csv
filter plugin is used to parse the CSV data. It specifies the delimiter (,
) and the expected columns in the CSV. - The
columns
option lists the names of the columns in the CSV file. Logstash will use these names to parse and structure the data.
- The
Output Configuration:
- The
output
section defines where Logstash should send the processed data. In this configuration, it’s set to send data to both Elasticsearch and the standard output (stdout). - The
elasticsearch
output plugin is used to send the data to an Elasticsearch cluster.hosts
: Specifies the Elasticsearch cluster’s address. In this case, it’s set tohttp://localhost:9200
.index
: Sets the index name in Elasticsearch where the data will be stored. Here, it’s named “linux.”
- The
stdout
output plugin is used for debugging and displays the data in a human-readable format using therubydebug
codec.
- The
Run the pipeline
1 | bin/logstash -f pipelines/es_data.conf |
1 | curl -X GET "http://localhost:9200/linux/_search?size=1" |
Output:
1 | { |
To view your index on Kibana
Kibana is a browser-based user interface that can be used to search, analyze and visualize the data stored in Elasticsearch indices.
1 | curl -O https://artifacts.elastic.co/downloads/kibana/kibana-7.15.0-darwin-x86_64.tar.gz |
What’s Next?
We will:
- Explore data transformation techniques to prepare the logs for analysis.
- Store the transformed data in Elasticsearch, enabling rapid querying and analysis.