Skip to main content
TWYTech World by Yashrajsinh

Spring Boot Observability

Y
Yashrajsinh
··8 min read·Intermediate

Spring Boot Observability

Running a Spring Boot application in production without observability is like driving at night without headlights. You cannot see problems until you crash into them. Spring Boot Actuator provides the foundation for production monitoring by exposing health checks, metrics, and environment information through HTTP endpoints. Combined with Micrometer for metrics, OpenTelemetry for distributed tracing, and structured logging, you get complete visibility into your application behavior. This guide shows you how to configure each layer of the observability stack and connect them into a cohesive monitoring system.

What You Will Learn

This guide covers the complete observability stack for Spring Boot applications. You will learn how to configure Actuator endpoints for health checks and readiness probes, how to expose and customize Micrometer metrics for Prometheus or other monitoring systems, how to implement distributed tracing with OpenTelemetry and Micrometer Tracing, how to set up structured logging with correlation IDs, and how to build custom health indicators and metrics for your domain. By the end you will have a production-ready observability setup that integrates with Spring Boot REST API applications and scales across microservice architectures.

Prerequisites

  • Working knowledge of Spring Boot REST API development including dependency injection and configuration properties
  • Understanding of Spring Boot Auto-Configuration and how starters work
  • Familiarity with Docker for running monitoring infrastructure locally
  • Basic understanding of HTTP endpoints and JSON responses
  • Experience with application deployment and the difference between development and production environments

Concept Overview

Observability in software systems rests on three pillars: metrics, traces, and logs. Metrics are numerical measurements collected over time, like request count, response latency, and memory usage. Traces follow a single request as it flows through multiple services, showing where time is spent and where failures occur. Logs provide detailed event records for debugging specific issues. Spring Boot Actuator and its ecosystem provide tools for all three pillars.

Spring Boot Actuator is a module that adds production-ready features to your application. It exposes endpoints for health checks, environment properties, bean listings, thread dumps, and more. These endpoints are consumed by orchestration platforms like Kubernetes for liveness and readiness probes, by monitoring systems for health dashboards, and by operations teams for debugging.

Micrometer is the metrics facade that Spring Boot uses internally. It provides a vendor-neutral API for recording metrics and ships with implementations for Prometheus, Datadog, CloudWatch, and dozens of other monitoring systems. When you add the Micrometer Prometheus registry to your classpath, all Spring Boot metrics are automatically exposed in Prometheus format at the actuator prometheus endpoint.

Distributed tracing with Micrometer Tracing, formerly Spring Cloud Sleuth, automatically instruments your application to propagate trace context across HTTP calls, message queues, and database operations. Each request gets a unique trace ID that connects all operations across services, making it possible to reconstruct the full request path in tools like Jaeger or Zipkin.

Step-by-Step Explanation

This section walks through the essential implementation steps in order. Each step builds on the previous one, guiding you from initial project setup to a fully functional application following Spring Boot conventions.

Configuring Actuator Endpoints

Actuator endpoints are disabled by default for security. You need to explicitly expose the endpoints you want available over HTTP. The health endpoint is the most critical because orchestration platforms use it to determine if your application is ready to receive traffic.

// application.yml configuration for Actuator
// management:
//   endpoints:
//     web:
//       exposure:
//         include: health,info,metrics,prometheus,env,loggers
//   endpoint:
//     health:
//       show-details: when-authorized
//       probes:
//         enabled: true
//   health:
//     diskspace:
//       enabled: true
//     db:
//       enabled: true
//   info:
//     env:
//       enabled: true
 
// Custom health indicator for an external dependency
package com.example.health;
 
import org.springframework.boot.actuate.health.Health;
import org.springframework.boot.actuate.health.HealthIndicator;
import org.springframework.stereotype.Component;
import org.springframework.web.client.RestTemplate;
 
@Component
public class PaymentServiceHealthIndicator implements HealthIndicator {
 
    private final RestTemplate restTemplate;
    private final String paymentServiceUrl;
 
    public PaymentServiceHealthIndicator(RestTemplate restTemplate,
                                          @org.springframework.beans.factory.annotation.Value("${payment.service.url}")
                                          String paymentServiceUrl) {
        this.restTemplate = restTemplate;
        this.paymentServiceUrl = paymentServiceUrl;
    }
 
    @Override
    public Health health() {
        try {
            var response = restTemplate.getForEntity(
                    paymentServiceUrl + "/actuator/health", String.class);
            if (response.getStatusCode().is2xxSuccessful()) {
                return Health.up()
                        .withDetail("service", "payment")
                        .withDetail("url", paymentServiceUrl)
                        .build();
            }
            return Health.down()
                    .withDetail("service", "payment")
                    .withDetail("status", response.getStatusCode().value())
                    .build();
        } catch (Exception e) {
            return Health.down()
                    .withDetail("service", "payment")
                    .withDetail("error", e.getMessage())
                    .build();
        }
    }
}

The health endpoint aggregates all registered health indicators into a single status. If any indicator reports DOWN, the overall status is DOWN. Kubernetes uses the liveness probe at /actuator/health/liveness to restart unhealthy pods and the readiness probe at /actuator/health/readiness to stop routing traffic to pods that are not ready to serve requests.

Micrometer Metrics Integration

Micrometer automatically instruments Spring MVC controllers, WebClient calls, JPA repositories, and connection pools. You get request duration histograms, active request counts, and error rates without writing any code. For domain-specific metrics, you create custom counters, gauges, timers, and distribution summaries.

package com.example.metrics;
 
import io.micrometer.core.instrument.*;
import org.springframework.stereotype.Component;
 
import java.util.concurrent.atomic.AtomicInteger;
 
@Component
public class ArticleMetrics {
 
    private final Counter articleViewCounter;
    private final Counter articleSearchCounter;
    private final Timer articleRenderTimer;
    private final AtomicInteger activeReaders;
    private final DistributionSummary wordCountSummary;
 
    public ArticleMetrics(MeterRegistry registry) {
        this.articleViewCounter = Counter.builder("articles.views")
                .description("Total number of article page views")
                .tag("type", "pageview")
                .register(registry);
 
        this.articleSearchCounter = Counter.builder("articles.searches")
                .description("Total number of article searches performed")
                .register(registry);
 
        this.articleRenderTimer = Timer.builder("articles.render.duration")
                .description("Time taken to render an article page")
                .publishPercentiles(0.5, 0.95, 0.99)
                .register(registry);
 
        this.activeReaders = new AtomicInteger(0);
        Gauge.builder("articles.active.readers", activeReaders, AtomicInteger::get)
                .description("Number of users currently reading articles")
                .register(registry);
 
        this.wordCountSummary = DistributionSummary.builder("articles.word.count")
                .description("Distribution of article word counts")
                .publishPercentiles(0.5, 0.75, 0.95)
                .register(registry);
    }
 
    public void recordView(String category, String slug) {
        articleViewCounter.increment();
        Counter.builder("articles.views.by.category")
                .tag("category", category)
                .tag("slug", slug)
                .register(articleViewCounter.getId().getTag("type") != null
                        ? (MeterRegistry) null : null);
    }
 
    public Timer.Sample startRenderTimer() {
        return Timer.start();
    }
 
    public void stopRenderTimer(Timer.Sample sample) {
        sample.stop(articleRenderTimer);
    }
 
    public void recordSearch() {
        articleSearchCounter.increment();
    }
 
    public void readerStarted() {
        activeReaders.incrementAndGet();
    }
 
    public void readerFinished() {
        activeReaders.decrementAndGet();
    }
 
    public void recordWordCount(int wordCount) {
        wordCountSummary.record(wordCount);
    }
}

Micrometer metrics are exposed at the /actuator/prometheus endpoint in Prometheus exposition format. Prometheus scrapes this endpoint at regular intervals and stores the time series data. You can then build Grafana dashboards that visualize request rates, error percentages, latency percentiles, and custom business metrics.

Distributed Tracing Setup

Distributed tracing connects operations across service boundaries. When Service A calls Service B, the trace context is propagated through HTTP headers. Both services record spans that are linked by a common trace ID. This allows you to see the complete request flow in a tracing backend like Jaeger or Zipkin.

// Dependencies needed in build.gradle or pom.xml:
// io.micrometer:micrometer-tracing-bridge-otel
// io.opentelemetry:opentelemetry-exporter-otlp
// io.micrometer:micrometer-tracing
 
// application.yml tracing configuration:
// management:
//   tracing:
//     sampling:
//       probability: 1.0  # Sample 100% in dev, reduce in production
//   otlp:
//     tracing:
//       endpoint: http://localhost:4318/v1/traces
 
// Custom span for business operations
package com.example.service;
 
import io.micrometer.observation.Observation;
import io.micrometer.observation.ObservationRegistry;
import org.springframework.stereotype.Service;
 
@Service
public class ArticlePublishingService {
 
    private final ObservationRegistry observationRegistry;
    private final ArticleRepository articleRepository;
    private final SearchIndexService searchIndexService;
    private final NotificationService notificationService;
 
    public ArticlePublishingService(ObservationRegistry observationRegistry,
                                     ArticleRepository articleRepository,
                                     SearchIndexService searchIndexService,
                                     NotificationService notificationService) {
        this.observationRegistry = observationRegistry;
        this.articleRepository = articleRepository;
        this.searchIndexService = searchIndexService;
        this.notificationService = notificationService;
    }
 
    public void publishArticle(String slug) {
        Observation observation = Observation.createNotStarted("article.publish", observationRegistry)
                .lowCardinalityKeyValue("article.slug", slug)
                .start();
 
        try {
            // Each of these operations creates a child span
            Article article = articleRepository.findBySlug(slug)
                    .orElseThrow(() -> new ArticleNotFoundException(slug));
 
            validateArticle(article);
            article.setPublished(true);
            articleRepository.save(article);
 
            // These calls propagate the trace context automatically
            searchIndexService.indexArticle(article);
            notificationService.notifySubscribers(article);
 
            observation.lowCardinalityKeyValue("outcome", "success");
        } catch (Exception e) {
            observation.error(e);
            observation.lowCardinalityKeyValue("outcome", "failure");
            throw e;
        } finally {
            observation.stop();
        }
    }
}

The Observation API is the recommended way to create custom spans in Spring Boot 3. It integrates with both metrics and tracing, so a single observation produces both a timer metric and a trace span. This reduces instrumentation code and ensures consistency between your metrics and traces.

Structured Logging with Correlation

Structured logging outputs log events as JSON objects instead of plain text lines. This makes logs machine-parseable and allows log aggregation systems like Elasticsearch or Loki to index and search log fields efficiently. Combined with trace correlation, you can jump from a trace span directly to the relevant log entries.

// logback-spring.xml configuration for structured JSON logging
// <configuration>
//   <appender name="JSON" class="ch.qos.logback.core.ConsoleAppender">
//     <encoder class="net.logstash.logback.encoder.LogstashEncoder">
//       <includeMdcKeyName>traceId</includeMdcKeyName>
//       <includeMdcKeyName>spanId</includeMdcKeyName>
//     </encoder>
//   </appender>
//   <root level="INFO">
//     <appender-ref ref="JSON" />
//   </root>
// </configuration>
 
// Using structured logging in application code
package com.example.service;
 
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.slf4j.MDC;
import org.springframework.stereotype.Service;
 
@Service
public class ContentValidationService {
 
    private static final Logger log = LoggerFactory.getLogger(ContentValidationService.class);
 
    public ValidationResult validateArticle(Article article) {
        MDC.put("articleSlug", article.getSlug());
        MDC.put("articleCategory", article.getCategory());
 
        log.info("Starting article validation",
                net.logstash.logback.argument.StructuredArguments.kv("wordCount", article.getWordCount()),
                net.logstash.logback.argument.StructuredArguments.kv("headingCount", article.getHeadingCount()));
 
        ValidationResult result = new ValidationResult();
 
        if (article.getWordCount() < 1500) {
            log.warn("Article below minimum word count",
                    net.logstash.logback.argument.StructuredArguments.kv("actual", article.getWordCount()),
                    net.logstash.logback.argument.StructuredArguments.kv("minimum", 1500));
            result.addViolation("WORD_COUNT_LOW", "Article has " + article.getWordCount() + " words, minimum is 1500");
        }
 
        if (article.getInternalLinkCount() < 3) {
            log.warn("Article has insufficient internal links",
                    net.logstash.logback.argument.StructuredArguments.kv("actual", article.getInternalLinkCount()),
                    net.logstash.logback.argument.StructuredArguments.kv("minimum", 3));
            result.addViolation("LINKS_LOW", "Article has " + article.getInternalLinkCount() + " internal links, minimum is 3");
        }
 
        log.info("Article validation complete",
                net.logstash.logback.argument.StructuredArguments.kv("violations", result.getViolationCount()),
                net.logstash.logback.argument.StructuredArguments.kv("passed", result.isPassed()));
 
        MDC.remove("articleSlug");
        MDC.remove("articleCategory");
 
        return result;
    }
}

When Micrometer Tracing is on the classpath, it automatically adds traceId and spanId to the MDC. This means every log entry within a traced request includes the trace identifier. You can search your log aggregation system for a specific traceId to find all log entries related to a single request, even across multiple services.

Custom Actuator Endpoints

Beyond the built-in endpoints, you can create custom Actuator endpoints that expose domain-specific information. This is useful for operations dashboards that need to show application-specific state like cache hit rates, queue depths, or content statistics.

package com.example.actuator;
 
import com.example.service.ContentStatisticsService;
import org.springframework.boot.actuate.endpoint.annotation.Endpoint;
import org.springframework.boot.actuate.endpoint.annotation.ReadOperation;
import org.springframework.boot.actuate.endpoint.annotation.Selector;
import org.springframework.stereotype.Component;
 
import java.util.Map;
 
@Component
@Endpoint(id = "content-stats")
public class ContentStatisticsEndpoint {
 
    private final ContentStatisticsService statisticsService;
 
    public ContentStatisticsEndpoint(ContentStatisticsService statisticsService) {
        this.statisticsService = statisticsService;
    }
 
    @ReadOperation
    public Map<String, Object> contentStats() {
        return Map.of(
                "totalArticles", statisticsService.getTotalArticleCount(),
                "totalWords", statisticsService.getTotalWordCount(),
                "averageWordCount", statisticsService.getAverageWordCount(),
                "articlesByCategory", statisticsService.getArticleCountByCategory(),
                "articlesByLevel", statisticsService.getArticleCountByLevel(),
                "lastPublished", statisticsService.getLastPublishedDate(),
                "validationStatus", statisticsService.getLastValidationStatus()
        );
    }
 
    @ReadOperation
    public Map<String, Object> categoryStats(@Selector String category) {
        return Map.of(
                "category", category,
                "articleCount", statisticsService.getArticleCountForCategory(category),
                "averageWordCount", statisticsService.getAverageWordCountForCategory(category),
                "deepDiveCount", statisticsService.getDeepDiveCountForCategory(category),
                "roadmapExists", statisticsService.hasRoadmap(category)
        );
    }
}

Custom endpoints are accessible at /actuator/content-stats and /actuator/content-stats/{category}. They follow the same security rules as built-in endpoints, so you can restrict access using the management security configuration. This keeps sensitive operational data protected while still making it available to authorized monitoring systems.

Real-World Use Cases

Kubernetes deployments rely heavily on Actuator health probes. The liveness probe tells Kubernetes whether the application process is healthy. If it fails, Kubernetes restarts the pod. The readiness probe tells Kubernetes whether the application can handle traffic. During startup or when a dependency is down, the readiness probe returns DOWN and Kubernetes stops routing requests to that pod. This prevents users from hitting unhealthy instances.

Microservice architectures use distributed tracing to debug latency issues. When a user reports slow page loads, you can find the trace for that request and see exactly which service call took the most time. Without tracing, you would need to correlate timestamps across multiple log files from different services, which is error-prone and time-consuming.

Auto-scaling decisions benefit from custom metrics. If you expose a metric for queue depth or active processing jobs, your orchestration platform can scale the number of instances based on actual workload rather than CPU usage alone. This leads to more responsive scaling that matches your application's specific bottlenecks.

Alerting systems consume Prometheus metrics to detect anomalies. You can set alerts for error rate spikes, latency increases, or health check failures. Combined with structured logging, when an alert fires you can quickly find the relevant log entries using the trace ID from the failing request. This reduces mean time to resolution from hours to minutes.

Best Practices

Secure your Actuator endpoints in production. Never expose all endpoints to the public internet. Use a separate management port that is only accessible from your internal network or monitoring infrastructure. At minimum, restrict the env, beans, and configprops endpoints because they can expose sensitive configuration values including database passwords and API keys.

Use low-cardinality tags on your metrics. High-cardinality tags like user IDs or request paths create millions of time series that overwhelm your monitoring system. Use categories, status codes, and method names as tags. If you need per-user metrics, aggregate them in your application and expose summary statistics.

Set appropriate sampling rates for distributed tracing in production. Tracing every request generates enormous amounts of data and adds overhead to every operation. A sampling rate of 1 to 10 percent is sufficient for most applications. You can increase sampling for specific endpoints or error conditions using a custom sampler.

Implement graceful shutdown that works with your health probes. When Kubernetes sends a SIGTERM, your application should immediately report NOT ready on the readiness probe, finish processing in-flight requests, and then shut down. Spring Boot handles this automatically when you configure the shutdown property to graceful and set an appropriate timeout.

Monitor your monitoring. If your metrics pipeline or tracing backend goes down, you lose visibility at the worst possible time. Set up alerts on the monitoring infrastructure itself and have a fallback plan like direct log access for when the observability stack is unavailable.

Common Mistakes

Exposing Actuator endpoints without authentication is a security vulnerability. The env endpoint can reveal database credentials, API keys, and other secrets. The heapdump endpoint can expose sensitive data from application memory. Always secure Actuator endpoints with authentication or network-level access controls in production.

Creating metrics with unbounded tag values causes cardinality explosion. A metric tagged with request.uri that includes path parameters like /users/12345 creates a new time series for every user. This fills your monitoring system storage and slows down queries. Always normalize paths and use bounded tag values.

Ignoring the overhead of tracing instrumentation can impact performance. Each span creation allocates objects and adds context propagation overhead. In hot paths that execute millions of times per second, this overhead becomes significant. Use sampling to reduce the volume and avoid instrumenting tight loops or utility methods.

Not configuring health indicator timeouts leads to cascading failures. If a health indicator makes an HTTP call to a dependency and that dependency is slow, the health check itself becomes slow. This can cause the orchestration platform to mark your application as unhealthy even though it is functioning correctly for user requests. Set aggressive timeouts on health indicator checks.

Logging at DEBUG level in production generates enormous log volumes that increase costs and make it harder to find important events. Use INFO as the default level and change it dynamically through the Actuator loggers endpoint when you need to debug a specific issue. The loggers endpoint lets you change log levels at runtime without restarting the application.

Summary

Spring Boot Actuator and the observability ecosystem provide everything you need to operate applications confidently in production. Actuator endpoints give orchestration platforms the health information they need for automated recovery. Micrometer metrics feed monitoring dashboards and alerting systems with both framework-level and custom business metrics. Distributed tracing connects operations across service boundaries so you can debug latency and failures efficiently. Structured logging with trace correlation makes log analysis fast and precise. The key to success is configuring each layer appropriately for your environment: secure endpoints in production, use low-cardinality metric tags, sample traces at a sustainable rate, and structure logs for machine consumption. Together these tools transform your application from a black box into a transparent system you can observe, understand, and improve continuously.

Intermediate8 min read

Spring Boot Auto-Configuration Deep Dive

Understand how Spring Boot auto-configuration works internally, how to create custom auto-configurations, and how to debug configuration issues.

Intermediate7 min read

Spring Data JPA Guide

Master Spring Data JPA from repository interfaces to custom queries, projections, auditing, and performance optimization for production applications.

Intermediate7 min read

Spring Boot REST API Complete Guide

Build production-grade REST APIs with Spring Boot covering validation, layered architecture, persistence, error handling, and testing strategies.