Operational Excellence-Monitoring and Alerting: setup Splunk and Xmatters to enhance application experience

Anil Gudigar
DevOps.dev
Published in
6 min readMar 23, 2024

Prevoius's blog focused more on strategies for enhancing infrastructure monitoring for resilience.

Here, the focus will be on enhancing the application experience for consumers and developers to troubleshoot the issues.

Monitoring and alerting are crucial aspects of maintaining an enhanced application experience. Splunk and xMatters are powerful tools that can be combined to achieve comprehensive monitoring, alerting, and incident response capabilities.

1. Data Collection with Splunk:

  • Utilize Splunk to collect and index log data from your applications, servers, network devices, Mobile applications, Web applications and other relevant sources using a plugin or SDK to log the data to Splunk index.
  • Create Index for each application i.e mywebindex or mymobileappindex
  • Configure Splunk to ingest data in real-time or at regular intervals to ensure timely analysis.

Configure Splunk Inputs:

a. TCP/UDP Input:

splunk add tcp 1514 -sourcetype <sourcetype> -index <index_name>
splunk add udp 1514 -sourcetype <sourcetype> -index <index_name>

b. File Input:

splunk add monitor /path/to/logfile.log -sourcetype <sourcetype> -index <index_name>

c. HTTP Event Collector (HEC):

splunk http-event-collector create <hec_name> -uri <hec_url> -token <hec_token>

Enable Data Collection:

splunk enable listen <port_number>

Verify Configuration:

splunk list input

Restart Splunk (if necessary):

splunk restart

Replace placeholders like <sourcetype>, <index_name>, <hec_name>, <hec_url>, and <hec_token> with appropriate values based on your setup.

Creating New Index

setup index name

2. Monitoring and Analysis:

  • Set up dashboards and visualizations in Splunk to monitor key performance indicators (KPIs) such as response times, error rates, throughput, etc.
  • Utilize Splunk’s search capabilities to identify trends, anomalies, and potential issues in your application environment.

Search on the index:

Search on the index
Aggregation query on index for number of events

Count query to get the number of API requests status for an index 4xx, 5xx,2xx

status count

Monitoring HTTP status code:

Monitoring HTTP status codes allows us to track the success, failure, or error rates of requests. By leveraging the Splunk query mentioned above, we can analyze the number of successful, failed, or error responses generated by our API or application.

Monitoring Custom errors

We could log custom errors from our server/application to the Splunk index and then we can search these indexes with custom errors to get the error and add monitoring and alerts

3. Alerting with Splunk:

  • Configure alerts in Splunk based on predefined thresholds or patterns that indicate potential issues or deviations from expected behaviour.
  • Use Splunk’s alerting mechanisms such as email notifications, webhook integrations, or custom scripts to trigger notifications when alerts are triggered.

Set up alerts in Splunk to trigger actions based on predefined conditions.

Example Splunk command to create an alert:

splunk add alert <alert_name> -condition "<condition_expression>" -action "<action>"

To set up alerts in Splunk for HTTP status codes in the 5xx range (server errors) occurring more than 300 times, you can use the following Splunk search query and alert configuration:

Search Query:

index=<your_index> status>=500 | stats count by status | where count > 300

This query will search for events with HTTP status codes greater than or equal to 500, and then calculate the count of each status code. Finally, it filters out the results where the count is greater than 300.

Example Splunk command to create an alert:

splunk add alert <alert_name> -condition "<condition_expression>" -action "<action>"

Alert Configuration (using savedsearches.conf):

[savedsearch://http_5xx_errors]
search = index=<your_index> status>=500 | stats count by status | where count > 300
action.email = 1
action.email.to = your@email.com
action.email.subject = Alert: HTTP 5xx Errors Exceeding Threshold
action.email.message = The count of HTTP 5xx errors has exceeded the threshold of 300. Please investigate.

This configuration creates a saved search named http_5xx_errors with the specified search query. When the condition is met (i.e., HTTP 5xx errors exceeding 300), it triggers an email alert to the specified recipient(s).

Remember to replace <your_index> with the actual index containing your HTTP logs, and adjust the email settings according to your environment.

You can apply this configuration by saving it savedsearches.conf in your Splunk configuration directory. After saving the configuration, Splunk will start monitoring for HTTP 5xx errors and trigger alerts accordingly.

4. Integration with xMatters:

xMatters Configuration:

Configure an inbound integration in xMatters to receive alerts from Splunk.

Define appropriate notification flows and recipients within xMatters.

Example xMatters command to create an inbound integration:

xmatters integration create -name "Splunk Integration" -type "Inbound" -endpoint "<endpoint_url>"

Configure the xMatters Actionable Alerts app

  1. Configure the app as outlined in the “How to install the xMatters integration in Splunk” section of our installation instructions, entering the trigger URL in the Inbound Integration URL field.
  2. Create the searches you want to trigger a request to xMatters and save them as alerts, selecting the app as the alert action and setting the recipients and priority to pass to xMatters. See “How to use this integration” for more details.

Configure a generic webhook

https://help.xmatters.com/ondemand/flowdesigner/splunk-steps.htm

5. Incident Response and Collaboration:

  • Use xMatters to coordinate incident response efforts by notifying the appropriate teams and stakeholders via SMS, voice calls, mobile push notifications, or collaboration tools like Slack.
  • Facilitate real-time communication and collaboration among responders using xMatters’ incident response platform, allowing teams to work together to resolve issues quickly and effectively.
  • Integrate incident management processes with other ITSM tools or ticketing systems for seamless workflow automation.

Integration Workflow:

Define the workflow for handling alerts and incidents across Splunk, xMatters, and ServiceNow.

Example workflow command:

workflow configure -name "Alert Handling Workflow" -steps "Splunk Alert -> xMatters Notification -> ServiceNow Incident Creation"

Test the end-to-end integration to ensure alerts are properly triggered, notifications are sent, and incidents are created.

integration test -name "Splunk-xMatters-ServiceNow Integration"

By executing these commands and configuring each platform accordingly, you can establish seamless integration between Splunk, xMatters, and ServiceNow for monitoring, alerting, and incident management in your mobile, web, or REST applications. This ensures that critical issues are promptly detected, communicated, and resolved, thereby enhancing the overall application experience.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

Published in DevOps.dev

Devops.dev is a community of DevOps enthusiasts sharing insight, stories, and the latest development in the field.

No responses yet

Write a response