IANN Monitor OpenShift Installation
This installation guide provides detailed, step-by-step instructions for setting up the IANN (Intelligent Artificial Neural Network) platform and its associated components. The purpose of this guide is to ensure a smooth and error-free installation process, enabling users to quickly configure the environment and start leveraging the system’s capabilities.
IANN (Intelligent Artificial Neural Network) is Pragma Edge’s AI-powered unified platform designed to bring intelligence, automation, and predictive analytics into business operations. It combines file tracking, monitoring, and AI-driven insights to ensure end-to-end visibility, operational efficiency, and proactive issue resolution
IANN plays a key role in driving digital transformation across industries by turning traditional data exchanges into intelligent, insight-driven processes.
IANN is built as a modular system with three primary components:
This installation guide is designed for a wide range of technical users involved in deploying, maintaining, or supporting IANN solutions. The intended audience includes:
This guide serves as a comprehensive manual for deploying and configuring the IANN platform in various environments (Linux, Windows, OpenShift). It covers the following:
Each section includes detailed prerequisites, component-level configuration, and post-deployment validation steps to ensure successful setup and operation of IANN in production and non-production environments.
This document provides a detailed, step-by-step guide for installing and configuring IANN Monitor 6.4. It ensures the seamless deployment of all core components, including Elasticsearch, Kibana, UI modules, and IANN Monitor agents.
By following this guide, you will be able to:
1. Install and configure the IANN Monitor server components
2. Set up and integrate all necessary client agents
3. Enable essential alerting and monitoring capabilities
Before proceeding, please review the Prerequisites section to verify that your environment satisfies all requirements for a successful installation.
1.1 Context Diagram of IANN Monitor:
The IANN Monitor Context Flow illustrates how the core components within the Monitor module interact with each other and with external systems to deliver real-time monitoring and alerting capabilities. It explains the structure of Monitor and the sequence of operations that enable proactive system health tracking and anomaly detection.
IANN Monitor Context Flow Description:
1. User Interaction & Visualization via IANN Monitor UI:
Users interact with the IANN Monitor UI to visualize real-time alerts and periodic weekly reports. The UI acts as the frontend layer and retrieves data directly from Elasticsearch over port 9200. This enables users to monitor system health, application performance, and recent alert history in an intuitive dashboard.
2. UI Functional Components & Notification Channels
The UI is structured into two major functional components:
Once alerts are processed, they are dispatched via integrated outbound channels:
a. Microsoft Teams via channel email ID
b. Slack via webhook URL
c. Email alerts via SMTP
d. Incident management tools like PagerDuty, X Matters, and ServiceNow using HTTP POST APIs
3. Elasticsearch – Central Data Processing and Storage Layer
Elasticsearch serves as the core data indexing engine. It receives logs and metrics from all connected clients and stores them for analysis and retrieval. It supports both live queries from the UI and backend alert rule evaluation.
All communication with Elasticsearch occurs over port 9200, enabling secure and consistent data ingestion and querying.
4. IANN Monitor Clients – Data Ingestion and Processing Pipeline
The data collection clients are composed of the API Jar and multiple client binaries. These components operate as follows
a. Establish a connection with the SI Oracle Database using port 1521
b. Collect live performance metrics through application endpoints via port 8180
c. Parse and structure the collected data
d. Forward the formatted data into Elasticsearch over port 9200 for storage and monitoring
5. SI Application Integration & Log Monitoring
The SI Application writes logs into the underlying Linux File System.
a. A log monitoring agent continuously scans and parses new log entries
b. Structured log data is pushed to Elasticsearch
Alerts are triggered based on log-based thresholds or anomaly detection rules
This document provides a comprehensive installation and configuration guide for IANN Monitor version 6.4 on OpenShift environments. It outlines the end-to-end process for deploying both server and client components, ensuring full observability across infrastructure, including Sterling Integrator, and connected systems.
The IANN Monitor platform comprises:
By following this guide, you will be able to:
Before proceeding, please review the Prerequisites section to verify that your environment satisfies all requirements for a successful installation.
Before proceeding with the installation of the IANN application in an OpenShift cluster, ensure the following prerequisites are in place:
1. Access to the OpenShift Cluster: Verify that you have the necessary permissions to access and manage the target OpenShift cluster.
2. IANN Helm Package: Ensure you have the Helm package required for deploying the IANN application.
3. Container Registry Access: Obtain the credentials needed to access the container registry that hosts the IANN application images.
4. Access to the IANN Namespace: Confirm that you have the appropriate permissions to create and manage resources within the designated namespace in the OpenShift cluster.
1.1 Platform Supported Model and Delivery
The IANN application can be deployed on the following platforms:
1. RedHat OpenShift Container Platform 4.14
2. IBM Cloud
3. AWS Cloud
4. Azure Cloud
1.2 Versions
1. IANN-agent-6.4
2. IANN-UI-6.4
1.3 Download and Transfer the Helm Package
Start by downloading the IANN Helm package. Transfer the package to a Linux backend that has access to the OpenShift cluster where the deployment will take place. After transferring the package, extract it using the following command:
1.4 Create a Namespace in OpenShift
Before making any modifications to the Helm charts, it is essential to create a dedicated namespace in the OpenShift cluster for the IANN application. Execute the following command to create the namespace:
Openshift:
oc create namespace <namespace-name> |
Example:
oc create namespace iann-clientname |
Kubernetes:
kubectl create namespace <namespace-name> |
Example:
kubectl create namespace iann-clientname |
1.5 Create an Image Pull Secret
Once the namespace has been created, the next step is to set up an image pull secret. This secret contains the necessary credentials to authenticate with the container registry and pull the required IANN images. Use the following command to create the secret:
Openshift:
oc create secret docker-registry <secret_name> \ |
Example:
oc create secret docker-registry test-secret \ –docker-server=my-private-registry.com \ –docker-username=my-username \ –docker-password=my-password \ –docker-email=my-email@example.com |
Kubernetes:
kubectl create secret docker-registry <secret_name> \ |
Example:
kubectl create secret docker-registry test-secret \ –docker-server=my-private-registry.com \ –docker-username=my-username \ –docker-password=my-password \ –docker-email=my-email@example.com |
By completing these steps, you ensure that the OpenShift cluster is properly configured to authenticate with the container registry during the deployment process.
1.6 Exposing Services in OpenShift
After deploying the IANN Monitor components using Helm, the following services must be exposed for external access (UI, Kibana, and API endpoints).
OpenShift Route Exposure
Use oc expose to create routes for services:
Expose IANN Monitor UI
oc expose svc/iann-ui-service -n <namespace> |
Expose Kibana
oc expose svc/kibana -n <namespace> |
Expose API Jar (optional)
oc expose svc/iann-api-service -n <namespace> |
To get the generated URLs:
oc get routes -n <namespace> |
This will list all exposed endpoints. These routes can be shared with users to access dashboards and APIs.
Note:
1. Ensure the OpenShift cluster allows external routes.
2. TLS can be enabled by editing the route type (edge, passthrough, reencrypt) depending on security requirements.
1.7 Troubleshooting and Recovery Scenarios
Common scenarios and recovery steps are listed below:
1. Pods Not Running:
Check Pod Status: Run this command to see the status of all pods in your namespace:
oc get pods -n <namespace> |
Describe Problematic Pod:
If any pod shows an error , run this command to view its details:
oc describe pod <pod-name> |
2. Helm Install Fails:
Use this Command followed by a reinstall. Check helm history for rollback options.
3. UI Not Loading:
Use this Command Validate Route is exposed using and ensure the UI pod is healthy.
oc get route -n <namespace> |
4. Elasticsearch Unhealthy:
Use this Command to check cluster status. Restart pod if needed.
curl -X GET <es-url>:9200/_cluster/health |
To further customize the deployment, additional changes can be made to the Helm charts. Begin by opening the values.yaml file in a text editor of your choice on the Linux backend.
1. In the root path of helm chart there will be a file name “app–secret.yaml”. Here we would add the passwords and keep it secret.
2. Here please provide the values which need to be kept in the values.yaml
3. The encrypt-elastic-password and encrypt-queueWatcher-password fields must be encrypted using the provided encryption utility. Follow the steps below to generate the encrypted values:
Steps to Encrypt Passwords:
Step 1: Download the Encryption Tool
Obtain the encryptInput.zip file from your designated S3 bucket or Nexus repository.
Step 2: Unzip the Encryption Tool
Extract the contents using the following command:
unzip encryptInput.zip |
Step 3: Run the Encryption Tool
Start the encryption program using:
./encryptInput/encryptInput |
Step 4: Encrypt Your Password
When prompted with:
Enter password:
Example:
Enter password: mysecretpassword
The tool will output an encrypted string like: EQHGuVfEHaxJQLfiXSgunZfYfjJqErrPBrmQTdTSuhNUpxG6to8YQOwp8yGqFIg2
Use this encrypted string as the value in the values.yaml file.
apiVersion: v1 kind: Secret metadata: name: agent-secret type: Opaque stringData: DB_PASS: encrypt-elastic-password: encrypt-queueWatcher-password: |
OpenShift:
1. To apply the changes to secret in the cluster, run the following command:
oc apply –f app-secret.yaml |
Output: secret/app-secret created |
2. This means the secret was successfully created or updated in the cluster using the settings defined in the app-secret.yaml file.
Kubernetes:
1. To apply the changes to secret in the cluster, run the following command
kubectl apply –f app-secret.yaml |
Output: secret/app-secret created |
2. This means the secret was successfully created or updated in the cluster using the settings defined in the app-secret.yaml file.
2.2 Service Account Configuration Section
Give the service account name and keep RBAC as true for the permissions that will be applied to the service account.
runAsUser: Sets a custom user ID (UID) for the container.
fsGroup: Sets a custom group ID (GID) for file system access.
serviceAccount: create: true name: testing # This will create a new ServiceAccount named “testing” rbac: create: true # This enables Role-Based Access Control(RBAC) for ServiceAccount security:
runAsUser: 1001 # The container will run as user ID 1001 supplementalGroup: 2002 # Additional group ID the container will belong to fsGroup: 2001 # Group ID used for file system access
|
2.3 Image Pull Secret
imagePullSecrets: Refers to the secret used for pulling images from the container registry. Replace the “test-secret” with your actual secret name.
imagePullSecrets: name: test-secret |
2.4 Log Configuration Section
For storing logs give the storage class name and required storage size
1. Give the agent image name and tag.
2. Give the hostname for the swagger URL. The default service port is 8080; you can change as required.
api: enabled: true image: name: myregistry.com/my-api # Container image repository tag: “”v1.2.3″” # Image tag pullPolicy: Always hostname: api.acmecorp.example.com #Provide the hostname for the swagger URL service: port: 8080 |
2.6 API Pod Resource Configuration
Give the resource CPU and memory limits for API pod.
resources: limits: cpu: “500m” #Specify the resource CPU for api pod memory: “512Mi” #Specify the memory limits for api pod requests: cpu: “250m” memory: “256Mi” |
2.7 Database Details
1. Provide DB details of the Sterling Integrator.
DB: db_type: #Specify the database type which you are using either DB2 or Oracle or mssql ssl_connection: false #Set the value to true if your using SSL connection between the application servers and database
db_port: #specify the port db_host: #specify the host db_name: #Specify the DATABASE Name db_user: #Specify the DB username db_password: #Specify the DB secret name |
|
2.8 Elastic Search Configuration
1. Provide the Elasticsearch details such as URL, port, username, and secret name which contains the password. Additionally, specify the index name, which must be unique.
elasticsearch: index: test url: https://<IP>:<PORT>/ #Elasticsearch URL port: #Elasticsearch PORT username: elastic #Username of Elasticsearch password: #Please add the secret name which contains the encrypted password use_ssl: false os: linux |
1. To enable deployment of the appdata pod, set ‘enable’ to ‘true’ and specify the Beat configurations you want the appdata pod to monitor.
2. jarvis_username: Username to authenticate with the API.
3. jarvis_password: Password for the Jarvis account.
4. api_host: Base URL of the API endpoint used for integration.
5. db_type: Specifies the DB type
6. scheduled_seconds: Defines a default time interval (300 seconds) for scheduled tasks or operations.
7. Waiting_count_version: Set waiting count version to true if SI version is 6.4 for waiting counts else false
8. Halted_count_version: Set halted count version to true if SI version is 6.4 else false.
9. list: list of directories or file paths related to the application that needs to be ignored while checking for mailbox depth
10. service_list: A list of services that the application will monitor or interact with.
11. mailbox_long_running_morethan_mins: 30 – Tracks mailbox processes that have been running for more than 30 minutes.
12. adapter_list: A list of various adapters (e.g., REST Http Server Adapter, SMTP Send Adapter) used for different integrations or communication protocols
13. Bp_list: A list of business processes (e.g., “TypingService”) that the application monitors or processes specifically.
14. schedulers_list: A list of scheduler jobs or tasks.
appdata: enabled: true image: name: # Container image repository resources: limits: configurations: jarvis_username: |
15. archive: Set to true/false to activate/inactivate the tracking of Archive counts.
16. index: Set to true/false to activate/inactivate the tracking of index counts.
17. purge: Set to true/false to activate/inactivate the tracking of purge counts.
18. get_mailbox_ref_query: Set to true/false to activate/inactivate the tracking of Mailbox depth.
19. document_processed: Set to true/false to activate/inactivate the tracking of Documents processed.
20. db_usage: Set to true/false to activate/inactivate the tracking of SI DB Usage.
21. mailbox_long_running: Set to true/false to activate/inactivate the tracking of mailbox long running.
22. run_time_of_service: Set to true/false to activate/inactivate the tracking of Service run time.
23. adapter_status: Set to true/false to activate/inactivate the tracking of Adapter status.
24. adapter_uptime_status: Set to true/false to activate/inactivate the tracking of uptime of adapters.
25. halted_count: Set to true/false to activate/inactivate the tracking of Halted BP count.
26. halting_count: Set to true/false to activate/inactivate the tracking of Halting BP count.
27. interrupted_count: Set to true/false to activate/inactivate the tracking of interrupted BP count.
28. waiting_count: Set to true/false to activate/inactivate the tracking of Waiting BP count.
29. waiting_on_io_count: Set to true/false to activate/inactivate the tracking of Waiting on IO BP count.
30. Purge_count: Set to true/false to activate/inactivate the tracking of Purge Count.
31. app_availability_time: Set to true/false to activate/inactivate the tracking of SI Application uptime
32. schedulers_status: Set to true/false to activate/inactivate the tracking of Schedulers status.
33. Non_index: Set to true/false to activate/inactivate the tracking of non-indexed data.
34. Bp_status: Set to true/false to activate/inactivate the tracking of BP status data.
35. get_mailbox_depth: Set to true/false to activate/inactivate the tracking of mailbox depth data.
36. external_perimeter: Set to true/false to activate/inactivate the tracking of Perimeter server status data.
37. active_count: Set to true/false to activate/inactivate the tracking of Active BP counts data.
NOTE: Each metric or data point here is in seconds and can have its own collection frequency.
38. get_mailbox_depth_seconds: 300 – Tracks mailbox depth every 300 seconds (5 minutes).
39. document_processed_seconds: 300
Tracks processed documents every 300 seconds (5 minutes).
40. db_usage_seconds: 300 – Tracks database usage every 300 seconds (5 minutes).
41. external_perimeter_seconds: 300 –Tracks external perimeter data every 300 seconds.
42. adapter_status_seconds: 300 – Tracks adapter status every 300 seconds (5 minutes).
43. halted_count_seconds: 1800 – Tracks halted counts every 1800 seconds (30 minutes).
44. halting_count_seconds: 1800 – Tracks halting counts every 1800 seconds.
45. interrupted_count_seconds: 1800 – Tracks interrupted counts every 1800 seconds.
46. waiting_count_seconds: 1800 – Tracks waiting counts every 1800 seconds.
47. waiting_on_io_count_seconds: 1800 – Tracks waiting-on-IO counts every 1800 seconds.
48. active_count_seconds: 1800 – Tracks active counts every 1800 seconds.
49. archive_seconds: 1800 – Tracks archive operations every 1800 seconds.
50. index_seconds: 1800 – Tracks index operations every 1800 seconds.
51. purge_seconds: 1800 – Tracks purge operations every 1800 seconds.
52. app_availability_time_seconds: 300 – Tracks application availability time every 300 seconds.
53. schedulers_status_seconds: 300 –Tracks scheduler status every 300 seconds.
54. run_time_of_service_seconds: 300 – Tracks service run time every 300 seconds.
55. adapter_uptime_status_seconds: 300 – Tracks adapter uptime every 300 seconds.
56. bp_status_seconds: 1800 – Tracks BP status every 1800 seconds.
57. non_index_seconds: 1800 – Tracks non-index data every 1800 seconds.
58. mailbox_long_running_seconds: 1800 – Tracks long-running mailbox processes every 1800 seconds.
2.10 PCM Stats Service Configuration
1. enabled: true – Enables the PCM stats service.
2. application_ref_query: Set to true/false to activate/inactivate the tracking of PCM Application activity data.
3. tp_ref_query: Set true/false to activate/inactivate the tracking of PCM Trading Partner activity data.
4. wf_ref_query: Set to true/false to activate/inactivate the tracking of PCM Workflow activity data.
5. application_ref_query_seconds: 300
Specifies that application reference queries should be tracked every 300 seconds (5 minutes).
6. tp_ref_query_seconds: 300
Specifies that transaction point reference queries should be tracked every 300 seconds (5 minutes).
7. wf_ref_query_seconds: 300 – Specifies that workflow reference queries should be tracked every 300 seconds (5 minutes).
#set enabled to true to enable the PCM stats service pcm_stats: enabled: true application_ref_query: true tp_ref_query: true #To monitor the trading partner changes wf_ref_query: true #To monitor the workflow changes application_ref_query_seconds: 1800 #To monitor any modifications in the application tp_ref_query_seconds: 1800 wf_ref_query_seconds: 1800 |
2.11 Sterling Reports Pod Configuration
This section configures the deployment of the Sterling Reports pod, which provides detailed reporting and monitoring for the Sterling application. These reports cover various aspects such as certificate status, long-running business processes, and more.
1. enabled: Set this to “true” to deploy the Sterling Reports pod. If set to “false”, the Sterling Reports pod will not be deployed.
2. duplicate_routing: Set this to “false” to disable monitoring of duplicate routing. This can be enabled if duplicate routing in your system needs to be tracked.
3. duplicate_routing_seconds: Define the interval, in seconds, for checking duplicate routing. If set to “60”, the check will run every minute.
4. trusted_certs: Set this to “true” to enable monitoring and reporting of trusted certificates. This helps track certificates that are considered trusted by the system.
5. ca_certs: Set this to “true” to enable monitoring and reporting of Certificate Authority (CA) certificates. These are certificates used to verify the authenticity of other certificates.
6. system_certs: Set this to “true” to enable monitoring and reporting of system certificates. These certificates are used for internal communication and ensuring secure connections.
7. long_running_bp_steps: Set this to “true” to track long-running business process (BP) steps. This will help identify steps that take longer than expected, which could indicate potential issues or inefficiencies.
8. long_running_bp: Set this to “true” to track long-running business processes. This helps identify processes that may be stuck or taking too long to complete.
9. bp_definition_details_for_si: Set this to “true” to enable reporting on business process (BP) definition details specifically for Sterling Integrator (SI). This will provide insights into the structure of BPs in SI.
10. bp_runs_bp_steps: Set this to “true” to enable tracking of BP steps executed during BP runs. This provides granular details about the execution of BP steps.
11. no_execution_per_node: Set this to “true” to monitor nodes with no executions. This can help identify idle nodes or potential issues with node execution.
12. bp_records_per_month: Set this to “true” to track the number of business process records generated per month. This can help with performance analysis and capacity planning.
13. bp_with_large_bp_steps: Set this to “true” to identify business processes that involve large BP steps. These large steps can impact performance and need monitoring.
14. trusted_certs_seconds: Define the interval, in seconds, for checking and reporting on trusted certificates. For example, “1800” means the check will run every 30 minutes.
15. ca_certs_seconds: Define the interval, in seconds, for checking CA certificates. For example, “1800” means the check will run every 30 minutes.
16. system_certs_seconds: Define the interval, in seconds, for checking system certificates. Again, “1800” means the check will run every 30 minutes.
17. long_running_bp_steps_seconds: Define the interval, in seconds, for checking long-running business process steps. For example, “900” means the check will run every 15 minutes.
18. long_running_bp_seconds: Define the interval, in seconds, for checking long-running business processes. Setting this to “1800” means the check will run every 30 minutes.
19. bp_definition_details_for_si_seconds: Determines the interval (in seconds) for collecting Business Process (BP) definition details specifically for Sterling Integrator.
20. bp_runs_bp_steps_seconds: Specifies how often (in seconds) the system gathers execution data of BP steps during BP runs.
21. no_execution_per_node_seconds: Determines the interval (in seconds) for monitoring nodes that have no BP executions. This helps identify inactive or underutilized nodes.
22. bp_records_per_month_seconds: Defines how frequently (in seconds) the number of BP records per month is calculated and reported.
23. bp_with_large_bp_steps_seconds: Sets the interval (in seconds) to identify business processes that contain large BP steps.
24. long_bp_steps_time_seconds: Specifies the threshold time (in seconds) used to flag a single BP step as long-running.
25. long_bp_time_seconds: Defines the threshold time (in seconds) that marks a complete business process as long-running.
26. large_bp_steps: Defines the size or complexity threshold for identifying large BP steps. A value of 3000 indicates that steps exceeding this limit (e.g., by node count or resource usage) are considered large.
#set enabled to true to deploy sterling_reports pod and provide sterling_reports beat configuration details below. sterling_reports: enabled: true # duplicate_routing: false duplicate_routing_seconds: 60 trusted_certs: true ca_certs: true system_certs: true long_running_bp_steps: true long_running_bp: true bp_definition_details_for_si: true bp_runs_bp_steps: true no_execution_per_node: true bp_records_per_month: true bp_with_large_bp_steps: true trusted_certs_seconds: 1800 ca_certs_seconds: 1800 system_certs_seconds: 1800 long_running_bp_steps_seconds: 900 long_running_bp_seconds: 1800 bp_definition_details_for_si_seconds: 900 bp_runs_bp_steps_seconds: 900 no_execution_per_node_seconds: 900 bp_records_per_month_seconds: 900 bp_with_large_bp_steps_seconds: 900 long_bp_steps_time_seconds: 900 long_bp_time_seconds: 900 large_bp_steps: 3000 |
2.12 Database Healthcheck Configuration
This section configures the deployment of the Database Healthcheck.
1. current_blocked_queries: Set to true/false to activate/inactivate the tracking of currently blocked queries.
2. current_blocked_queries_seconds: Define the interval, in seconds, for checking long-running business processes. Setting this to “300” means the check will run every 5 minutes
3. avg_latency: Set to true/false to activate/inactivate the tracking of average latency.
4. avg_latency_seconds: Define the interval, in seconds, for checking long-running business processes. Setting this to “300” means the check will run every 5 minutes
5. bp_locks_minutes: 30
Specifies the threshold for business process (BP) locks monitoring (locks older than 30 minutes are flagged).
6. days: 2 Specifies the number of days for certain data points, like extraction and message age checks (e.g., checks for unextracted messages older than 2 days).
7. over_all_database_size: Set to true/false to activate/inactivate the tracking of the overall database size.
8. database_check: Set to true/false to activate/inactivate the tracking of database status.
9. tablespace_usage: Set to true/false to activate/inactivate the tracking of tablespace usage.
10. active_sessions: Set to true/false to activate/inactivate the tracking of active database sessions.
11. inactive_sessions: Set to true/false to activate/inactivate the tracking of inactive database sessions.
12. total_sessions: Set to true/false to activate/inactivate the tracking of total number of database sessions.
13. current_blocked_sessions: Set to true/false to activate/inactivate the tracking of currently blocked sessions.
14. invalid_object_status: Set to true/false to activate/inactivate the tracking of invalid objects within the database.
15. unusable_indexes: Set to true/false to activate/inactivate the tracking of unusable indexes in the database.
16. database_locks: Set to true/false to activate/inactivate the tracking of database locks.
17. db_response_time: Set to true/false to activate/inactivate the tracking of database response time.
db_healthcheck:
enabled: true #Set as true if you want to install db healthcheck beat current_blocked_queries: true current_blocked_queries_seconds: 300 avg_latency: true avg_latency_seconds: 300 bp_locks_minutes: 30 days: 2 over_all_database_size: true database_check: true tablespace_usage: true active_sessions: true inactive_sessions: true total_sessions: true current_blocked_sessions: false invalid_object_status: false unusable_indexes: false database_locks: true db_response_time: true
|
18. lifespan: Set to true/false to activate/inactivate the tracking of the lifespan of database components.
19. top_tables: Set to true/false to activate/inactivate the tracking of top tables by usage.
20. mailboxes_with_unextracted_messages: Set to true/false to activate/inactivate the tracking of mailboxes with unextracted messages.
21. mailboxes_with_extracted_messages: Set to true/false to activate/inactivate the tracking of mailboxes with extracted messages.
22. no_of_messages_extracted_older_n_days: Set to true/false to activate/inactivate the tracking of messages extracted older than a specified number of days.
23. mailbox_with_unextracted_messages_older_than_ndays:Set to true/false to activate/inactivate the tracking of mailboxes with unextracted messages older than a specified number of days.
24. purge_locks: Set to true/false to activate/inactivate the tracking of purge locks in the database.
25. cluster_status: Set to true/false to activate/inactivate the tracking of cluster status.
26. redolog_group_status_check: Set to true/false to activate/inactivate the tracking of redo log groups. This parameter is used for oracle database.
27. Amount_of_Redo_Generated_per_Hour: Set to true/false to activate/inactivate the tracking amount of redo generated per hour. This parameter is used for oracle database.
28. redo_generation_per_day: Set to true/false to activate/inactivate the tracking of redo generation per day. This parameter is used for oracle database.
29. redo_file_change: Set to true/false to activate/inactivate the tracking of redo file changes. This parameter is used for oracle database.
30. cpu_util: Set to true/false to activate/inactivate the tracking of CPU utilization.
31. ram_util: Set to true/false to activate/inactivate the tracking of RAM utilization.
32. db_conn: Set to true/false to activate/inactivate the tracking of database connections.
lifespan: false top_tables: true mailboxes_with_unextracted_messages: false mailboxes_with_extracted_messages: false no_of_messages_extracted_older_n_days: true mailbox_with_unextracted_messages_older_than_ndays: true purge_locks: false cluster_status: false redolog_group_status_check: true Amount_of_Redo_Generated_per_Hour: true redo_generation_per_day: true redo_file_change: true cpu_util: true ram_util: true db_conn: true
|
33. write_latency: Set to true/false to activate/inactivate the tracking of write latency.
34. read_latency: Set to true/false to activate/inactivate the tracking of read latency.
35. read_iops: Set to true/false to activate/inactivate the tracking of read I/O operations per second (IOPS).
36. write_iops: Set to true/false to activate/inactivate the tracking of write I/O operations per second (IOPS).
37. read_throughput: Set to true/false to activate/inactivate the tracking of read throughput (amount of data read).
38. write_throughput: Set to true/false to activate/inactivate the tracking of write throughput (amount of data written).
39. over_all_database_size_seconds: 1800 – Frequency for tracking overall db size
40. database_check_seconds: 1800 – Frequency for performing a database health check
41. tablespace_usage_seconds: 1800 – Frequency for tracking tablespace usage
42. active_sessions_seconds: 1800 – Frequency for monitoring active sessions
43. inactive_sessions_seconds: 1800 – Frequency for monitoring inactive sessions
44. total_sessions_seconds: 1800 – Frequency for tracking total sessions
45. current_blocked_sessions_seconds: 1800 – Frequency for tracking blocked sessions
46. invalid_object_status_seconds: 1800 – Frequency for checking invalid object status
47. unusable_indexes_seconds: 1800 – Frequency for monitoring unusable indexes
48. database_locks_seconds: 1800 – Frequency for tracking database locks
49. db_response_time_seconds: 1800 – Frequency for tracking database response
50. time lifespan_seconds: 1800 – Frequency for tracking lifespan
51. top_tables_seconds: 1800 – Frequency for tracking top tables
write_latency: true read_latency: true read_iops: true write_iops: true read_throughput: true write_throughput: true over_all_database_size_seconds: 1800 database_check_seconds: 1800 tablespace_usage_seconds: 1800 active_sessions_seconds: 1800 inactive_sessions_seconds: 1800 total_sessions_seconds: 1800 current_blocked_sessions_seconds: 1800 invalid_object_status_seconds: 1800 unusable_indexes_seconds: 1800 database_locks_seconds: 1800 db_response_time_seconds: 1800 lifespan_seconds: 1800 top_tables_seconds: 1800
|
52. mailboxes_with_unextracted_messages_seconds: 1800 – Frequency for tracking mailboxes with unextracted messages
53. mailboxes_with_extracted_messages_seconds: 1800 – Frequency for tracking mailboxes with extracted messages
54. no_of_messages_extracted_older_n_days_seconds: 1800 – Frequency for tracking the number of messages extracted older than specified days
55. mailbox_with_unextracted_messages_older_than_ndays_seconds: 1800 – Frequency for tracking mailboxes with unextracted messages older than specified days
56. purge_locks_seconds: 1800 – Frequency for monitoring purge locks
57. cluster_status_seconds: 120 – Frequency for monitoring cluster status
58. redolog_group_status_check_seconds: 1800 – Frequency for checking redo log group status
59. Amount_of_Redo_Generated_per_Hour_seconds: 1800 – Frequency for tracking amount of redo generated per hour
60. redo_generation_per_day_seconds: 1800- Frequency for tracking redo generation per day
61. redo_file_change_seconds: 1800 – Frequency for tracking redo file changes
62. cpu_util_seconds: 1800 – Frequency for monitoring CPU utilization
63. ram_util_seconds: 1800 – Frequency for monitoring RAM utilization
64. db_conn_seconds: 1800 – Frequency for tracking database connections .
mailboxes_with_unextracted_messages_seconds: 1800 mailboxes_with_extracted_messages_seconds: 1800 no_of_messages_extracted_older_n_days_seconds: 1800 mailbox_with_unextracted_messages_older_than_ndays_seconds: 1800 purge_locks_seconds: 1800 cluster_status_seconds: 120 redolog_group_status_check_seconds: 1800 Amount_of_Redo_Generated_per_Hour_seconds: 1800 redo_generation_per_day_seconds: 1800 redo_file_change_seconds: 1800 cpu_util_seconds: 1800 ram_util_seconds: 1800 db_conn_seconds: 1800 write_latency_seconds: 1800 read_latency_seconds: 1800 read_iops_seconds: 1800 write_iops_seconds: 1800 read_throughput_seconds: 1800 write_throughput_seconds: 1800 dbname: Database Name iops_throughput: true iops_throughput_seconds: 300 blockedDuration: 5 long_running_queries: true page_life_expectancy: true buffer_cache_hit_ratio: true buffer_cache_hit_ratio_seconds: 1800 page_life_expectancy_seconds: 1800 long_running_queries_seconds: 1800
|
65. dbname: specify the db name
66. iops_throughput: Set to true/false to activate/inactivate the tracking of iops throughput
67. iops_throughput_seconds: Frequency for monitoring iops_throughput
68. blockedDuration: It refers to the time from when these blocked queries are blocked example: if it is set to 5 agent monitors the blocked queries which are blocked from last 5 or more minutes
69. long_running_queries: Set to true/false to activate/inactivate the tracking of long_running_queries
70. page_life_expectancy: Set to true/false to activate/inactivate the tracking of page_life_expectancy
71. buffer_cache_hit_ratio: Set to true/false to activate/inactivate the tracking of buffer_cache_hit_ratio
72. buffer_cache_hit_ratio_seconds: Frequency for monitoring buffer_cache_hit_ratio
73. page_life_expectancy_seconds: Frequency for monitoring page_life_expectancy
74. long_running_queries_seconds: Frequency for monitoring long_running_queries.
2.13 Queuewatcher Configuration
1. enabled: true – Enables the QueueWatcher service.
2. queue_url: Provide the B2BI QueueWatcher Service URL (HTTP)
3. schedule_queuewatcher_seconds: 60
Defines the schedule interval (in seconds) for the QueueWatcher to check or interact with the queue.
4. schedule_threads_seconds: 300
Defines the schedule interval (in seconds) for thread monitoring related to the queue.
queueWatcher: enabled: true image: name: # Container image repository resources: limits: configuration: queue_url: http://0.0.0.0:0000/queueWatch/ |
2.14 Heartbeat Pod Configuration
This section configures the deployment of the Heartbeat pod, which monitors the availability and responsiveness of specified IPs and URLs. It helps ensure that key services and endpoints are operational and responding within defined time limits.
General Settings
1. enabled: Set to true to enable heartbeat monitoring.
Image Configuration
1. name: Container image repository.
2. tag: Image version (leave empty for latest).
3. pullPolicy: Always (or IfNotPresent).
Resource Configuration
1. limits: Define CPU and memory limits.
2. requests: Define minimum CPU and memory requests.
Heartbeat Parameters
1. ip: IP address to monitor.
2. urls: List of URLs to monitor (e.g., [“http://example.com“]).
3. schedule_heatbeat_seconds: Interval between checks (default: 60).
4. timeout_seconds: Timeout for each check (default: 10).
heartbeat: enabled: true image: name: # Container image repository resources: limits: configuration: ip: |
2.15 SSP Configuration
This section monitors the ssp engine adapters.
enabled: Set to true to enable SSP.
data: Provide the required data in JSON format. For example, {“Test Engine1 Adapters1”:[“0.0.0.0”,0000]}.
ssp:
enabled: true
data: ‘{“Test Engine1 Adapters1”:[“0.0.0.0”,0000]}’ |
|
2.16 Systemstats Pod Configuration
This section monitors all the pods,nodes health status in the OpenShift Cluster and send that to elasticsearch.
2.17 Save file and install application
1. After updating all the values in values.yml we need to install the helm which deploys all the resources in the templates file.
2. Save the values.yaml file.
3. To install the helm chart:
helm install <release_name> -f <path of values.yaml> <path/to/helmchart> |
4. To upgrade the helm chart:
helm upgrade <release_name> -f <path of values.yaml> <path/to/helmchart>
5. To roll back the helm chart by giving revision number:
helm rollback <release_name> <revision_number> |
This section outlines the step-by-step validation checks to be performed after deploying the IANN Monitor stack. These checks help ensure that all services, components, and configurations are functioning as expected across OpenShift and Kubernetes environments
3.1 Check Helm Installation
Run this Command to Confirm status is deployed.
helm list -n <namespace> |
3.2 Check All Pods
After deployment, it’s important to verify that all application components are running correctly. Use the following commands to check the status of all pods in the target namespace:
Openshift:
oc get pods -n <namespace> |
Kubernetes:
kubectl get po –n <namespace> |
Review the output and ensure that all pods are in either the Running or Completed state.
Agents:
3.3 Check Services and Endpoints
Ensure that all essential services are up and properly exposed within the cluster. Use the following commands to list services in the specified namespace:
Openshift:
oc get svc -n <namespace> |
kubectl get svc –n <namespace-name> |
Verify that critical services such as API, QueueWatcher, and Elasticsearch are present in the output. Their absence may indicate deployment or configuration issues.
To list all secrets in a specific namespace
Openshift:
oc get secrets -n <namespace> |
Kubernetes:
kubectl get secret –n <namespace> |
Make sure that both the agent secrets and image pull secrets are listed and available.
To list the Agent routes use below Command:
oc get routes |
3.4 Check PVCs (Storage)
Persistent Volume Claims (PVCs) ensure that pods have the necessary storage attached. To verify the status of PVCs in the specified namespace.
Openshift:
oc get pvc -n <namespace> |
Kubernetes:
kubectl get pvc –n <namespace> |
Check that all PVCs show a Bound status, indicating successful allocation of storage. Any other status may require storage configuration review.
3.5 Check Pod Logs
Pod logs help diagnose issues related to application startup, configuration, or runtime errors. Use the following commands to view logs for a specific pod:
Openshift:
oc logs <pod-name> -n <namespace> |
Kubernetes:
kubectl logs <pod-name> –n <namespace> |
Review the logs for any errors, warnings, or crash-related messages that could indicate problems with the application or service.
To check the pod logs use the below command:
oc exec -it <pod-id> — /bin/bash |
Backend API Jar:
3.6 Check DB Health Metrics
Confirm database stats (latency, locks, sessions) are visible in the UI or logs.
3.7 Check Heartbeat and QueueWatcher
Make sure your heartbeat monitors IPs/URLs.
Confirm QueueWatcher is pulling queue info.
3.8 Check Systemstats
Verify pod and node data is being collected.
3.9 Check Helm Rollback Option
Run the following command to review the deployment history and identify rollback options for a Helm release
helm history <release-name> -n <namespace> |
Note: the revision number in case rollback is needed.
4.1 Download and
Transfer the Helm Package
Start by downloading the IANN Helm package. Transfer the package to a
Linux backend that has access to the OpenShift cluster where the deployment
will take place. After transferring the package, extract it using the following
command:
tar –xvf <helm_package_name> |
4.2 Create an Image Pull Secret
Once the namespace has been created, the next step is to set up an image pull secret. This secret
contains the necessary credentials to authenticate with the container registry and pull the required
IANN images. Use the following command to create the secret:
Openshift:
oc create secret docker-registry <secret_name> \ –docker-server=<your-registry-server> \ –docker-username=<your-username> \ –docker-password=<your-password> \ -n <namespace> |
Kubernetes:
kubectl
create secret docker-registry <secret_name> \ |
By completing these steps, you ensure that the OpenShift cluster is
properly configured to authenticate within the container registry during the
deployment process.
4.3 Helm charts changes
To further customize the deployment, additional changes can be made to the Helm charts. Begin by opening the values.yaml file in a text editor of your choice on the Linux backend. Ensure the following:
4.3.1 Service account and RBAC
1. Specify the service account name to be used.
2. Set RBAC to true to enable role-based access control for the permissions applied to the service account.
serviceAccount: create: true name: “” #Provide the service account name if service account already created rbac: create: true |
4.3.2 Image and Pull Secret
1. Image: Specify the name of the container image
2. Repository: Indicate the repository where the image is hosted.
3. Tag: Mention the version or tag of the image.
4. ImagePullSecret Name: Use the name of the pull secret once it is configured.
image: repository: tag: ” ” pullPolicy: Always replicaCount: 1 imagePullSecrets: |
4.3.3 Service Configuration
Default Port: The default port is set to 8080. You can modify this value if needed to suit your application requirements.
service: annotations: {} ports: port: 8080 |
In OpenShift, we use a route to access the UI via a DNS name. Please provide the host name to configure the route accordingly.
ui_route: host: tls: termination: |
4.3.5 UI Config
Set Accept License to true, and you may keep the file location as the default unless a different location is required.
Accept_licence:true Python: FileLocation: /opt/IANN/default_dashboard/default_dashboard Jwt: Session_expire:’6000000’ |
For default User, please provide the password secret. If SAML authentication is
enabled, set it to Y.
defaultUser:
emailId: password: #Provide the application secret name |
4.3.6 Elasticsearch Configurations
Provide the Elasticsearch details:
1. Specify the Elasticsearch certificate truststore name that has been stored in the secret.
2. For trustStorePassword, provide the name of the secret that contains the password.
To generate the trustStorePassword secret,
run the following command:
oc create secret generic <truststore.jks>–from-file=<truststore.jks> |
Note: Ensure the secret name and the truststore file name are the same.
For trustStorePassword, provide the name of the secret containing the password
elasticSearch:
data: userName: password: #Provide the application secret name hostName: port: 9200 scheme: https trustStore: #Please provide the truststore file name which should be same as secret name
trustStorePassword: |
4.3.7 SAML
1. If you want to configure SAML with IANN UI, provide the SAML details in the values.yaml file within the section indicated below.
2. Additionally, place the metadata.xml.tld file in the files folder, which is located in the root directory of the UI Helm chart.
saml: enabled: false sso_url: idp: metadata: #Please add the file in the files folder of the helm chart jwt: session_expire: # Minutes expiration: 900000 refresh_token: expiration: 604800000 |
In the values.yaml file, the following SMTP configuration settings are provided to enable email notifications. Please update the parameters with your organization’s specific SMTP server details:
Configuration Parameters:
1. host: The hostname of the SMTP server (e.g., smtp.example.com).
2. port: The port number used by the SMTP server (default ports are 25, 465 for SSL, or 587 for TLS), we are using 587
3. username: The username to authenticate with the SMTP server.
4. password: The password for the specified SMTP username.
5. sender_email: The email address that will appear as the sender.
6. sender name: The name associated with the sender’s email address.
7. app_contact_email: The contact email address for application-related inquiries.
8. use_tls: A boolean value indicating whether to use TLS (True/False).
9. use_ssl: A boolean value indicating whether to use SSL (True/False).
10. default_subject: A default subject line for email notifications.
Please Ensure to replace placeholder values with the actual SMTP configuration details provided by your email service provider.
spring : mail : host: |
4.3.9 Logo
LogoName: The file path to the logo image. Ensure that the logo is in a supported format (e.g., PNG, JPEG) and the file is stored at the specified location.
logo: logoName: /opt/IANN/logo/IANN.png |
4.3.10 Alerts
Configuration Parameters:
1. enabled: A Boolean flag that enables or disables resource allocation for the container. Set to true to enable resource requests and limits.
2. requests: Specifies the minimum resources that Kubernetes will allocate to the container.
3. memory: The amount of memory required (e.g., 800Mi for 800 MiB of memory).
4. cpu: The amount of CPU requested (e.g., 400m for 0.4 CPU cores).
5. limits: Specifies the maximum resources that Kubernetes will allocate to the container.
6. memory: The maximum memory allowed (e.g., 800Mi).
7. cpu: The maximum CPU allowed (e.g., 400m).
Adjust these values based on the expected load and resource availability for your application.
alerts:
enabled: true resources: requests: memory: “” cpu: “” limits: memory: “” cpu: “ |
4.3.11 Elastic search
1. index: The name of the Elasticsearch index used by the IANN agents.
2. port: The port for Elasticsearch (default is 9200).
3. url: The complete URL for connecting to Elasticsearch, including the protocol (http or https).
4. username: The username used to authenticate Elasticsearch.
5. password: The application secret for the Elasticsearch password.
6. use_ssl: A boolean value indicating whether to use SSL for the Elasticsearch connection. Set to true if SSL/TLS is enabled, otherwise false.
7. os: The operating system used in the deployment.
elasticsearch:
index: port: 9200
url: username: password: #Provide the application secret name use_ssl: true os: |
4.3.12 Email
1. port: The port number used for the SMTP connection (default is 587 for TLS).
2. host: The hostname of the SMTP server used to send emails.
3. from: The email address from which notifications will be sent.
4. username: The username used to authenticate with the SMTP server.
5. password: The application secret for the SMTP password.
6. to: The email address where weekly reports will be sent. Ensure this address is correct.
7. time zone: The timezone to be used for scheduling email notifications.
email:
port: 587 host: from: username: password: #Provide the application secret name to: timezone: |
4.3.13 Weekly report
In the values.yaml file, configure the weekly report pod service settings, including the resource requests and limits for memory and CPU. These settings help ensure that the weekly report pod operates with the necessary resources allocated for efficient performance.
Configuration Parameters:
1. enabled: A boolean flag to enable or disable the weekly report pod service. Set to true to enable it.
2. resources: Specifies the resource allocation for the pod.
· requests: Defines the minimum resources that should be allocated to the pod.
· memory: The amount of memory required for the pod (e.g., 200Mi for 200 MiB).
· cpu: The amount of CPU requested for the pod (e.g., 100m for 0.1 CPU core).
3. limits: Defines the maximum resources that the pod can use.
1.1. memory: The maximum memory allowed for the pod (e.g., 200Mi).
1.2. cpu: The maximum CPU allowed for the pod (e.g., 100m).
Adjust the resource values according to the expected load and available resources in your environment to ensure optimal performance of the weekly report service.
weekly_report:
enabled: true resources: requests: memory: “” cpu: “” limits: memory: “” cpu: “” |
4.3.13
Environment
In the
values.yaml file, provide the environment name where the dashboard is deployed,
along with the respective prefix used during its creation. Specify the path to
the logo image, the name of the PDF file for weekly report generation, and the
timezone. Additionally, include the pod metrics index and PVC names used for
the namespace.
Configuration Parameters:
1. environment prefix: The prefix used to identify the environment (e.g., dev, prod).
2. image_path: The full path to the logo image (e.g., /opt/IANN/logo/IANN.png).
3. pdf_name: The name of the weekly report PDF to be generated (e.g., dev_weekly_report.pdf).
4. ram_field: The field used to capture the RAM data (optional, depending on your setup).
5. image_width and image_height: The dimensions for the logo image (in pixels).
6. x_axis and y_axis: The position of the logo on the report (coordinates for placement).
7. database_usage_field: The field that tracks database usage (e.g., percentageUsed).
8. timezone: The timezone used for scheduling (e.g., UTC, Europe/Berlin).
9. pod_metrics_details: The pod metrics index format to specify the pods and namespaces.
Example: _podsystemstats-, _podstatus-:,}
10. pvc_name: The name of the PVC used for the B2BI namespace for monitoring.
Make sure to replace the placeholder values with the actual details for your environment, such as the environment prefix, namespace, pod names, PVC names, and any other necessary fields.
environment:
environment_prefix: image_path: pdf_name: ram_field: image_width: iamge_height: x_axis: y_axis: database_usage_field:percentageUsed timezone: pod_metrics_details:{“”} pvc_name
|
4.3.14 Save File and Install Application
After updating all the values in values.yml we need to
install the helm which deploys all the resources in the templates file.
1. Save the values.yaml file.
2. To install the helm chart:
helm install <release_name> -f <path of values.yaml> <path/to/helmchart> |
3. To upgrade the helm chart:
helm upgrade <release_name> -f <path of values.yaml> <path/to/helmchart> |
4. To roll back the helm chart by giving revision number:
helm rollback <release_name> <revision_number> |
Verify that the Helm release has been successfully deployed by running the following command
helm list -n <namespace> |
After installation, verify that all application pods are running as expected in the target namespace.
Openshift:
oc get pods -n <namespace> |
Kubernetes:
kubectl get po –n <namespace> |
Ensure that each pod is in a Running or Completed state. Any other status may indicate deployment issues that require investigation.
UI:
5.3 Check Services and Endpoints
Verify that required services are exposed and accessible within the cluster.
Openshift:
oc get svc -n <namespace> |
Kubernetes:
kubectl get svc –n <namespace-name> |
Confirm services like IANN Monitor UI
Openshift:
oc get secrets -n <namespace> |
Kubernetes:
kubectl get secret –n <namespace> |
Ensure agent secrets and image pull secrets are present.
Ensure that all required Persistent Volume Claims (PVCs) have successfully bound to their respective Persistent Volumes.
Openshift:
oc get pvc -n <namespace> |
Kubernetes:
kubectl get pvc –n <namespace> |
Verify that the STATUS for each PVC is Bound. Any other status may indicate a storage provisioning issue.
Inspect pod logs to identify any errors or unexpected behavior during startup or runtime.
Openshift:
oc logs <pod-name> -n <namespace> |
Kubernetes:
kubectl logs <pod-name> –n <namespace> |
Review the log output for errors, warnings, or crash-related messages that may indicate issues with the application.
To check the pod logs use the below command:
oc exec -it <pod-id> — /bin/bash |
UI Logs:
Alerts Logs:
Send Mail Logs:
5.6 Access the IANN Monitor UI
To list the UI routes use below Command:
oc get routes |
Open the IANN Monitor URL. Make sure the UI loads and shows data.
Make sure logs are saved.
If alerts are configured, test if they are working.