Machine Learning-based Anomaly Detection for Practical SOC: Key Strategies and Implementation Guide

Scenario Introduction: Struggling Amid Evolving Threats

Recently, Company A, a leading IT service provider in Japan, was grappling with increasingly sophisticated cyber threats. Operating hundreds of servers, a cloud environment, and tens of thousands of endpoints, the company generates vast amounts of log data, ranging from gigabytes to terabytes daily. Traditionally, security monitoring relied on signature-based rule detection systems and statistical threshold alerts. However, this approach clearly revealed limitations in detecting unknown (Zero-day) attacks, insider threats, and Advanced Persistent Threat (APT) attacks disguised as legitimate activities. Particularly in cloud environments, more complex anomaly patterns emerged, increasing the difficulty of detection.

Company A's SOC team was exposed to severe risks such as potential data breaches, service disruptions, and reputational damage due due to these overlooked threats. They urgently needed a fundamental solution that could not only collect more logs but also proactively detect “anomalous behavior” that was difficult to capture with existing methods, enabling rapid response. The ultimate goal was to drastically improve the detection rate for actual threats while reducing false positives, thereby maximizing the efficiency of their limited monitoring personnel.

In this scenario, Company A focused on effectively analyzing the enormous volume of security log data generated in its large-scale environment to identify dynamic and complex anomalous behaviors that were difficult to detect with traditional static rules. Specifically, they were collecting cloud configuration change logs and access patterns through FRIIM CNAPP, but they struggled to integrate these with their existing SIEM system for a unified analytical perspective. This situation presented a common challenge for many companies, and a new approach was critically needed to resolve it.

Challenges: The Quagmire of Chronic Missed Detections and False Positives

The challenges faced by Company A's SOC team were multifaceted. Firstly, the overwhelming log volume. Billions of events occur daily, making it nearly impossible for humans to directly analyze all this data or find meaningful threats using only static rules. Identifying meaningful signals from large-volume logs was the first hurdle.

Secondly, the advancement of detection evasion techniques. Attackers employ various mutant malware and leverage Living-off-the-Land (LotL) techniques that exploit legitimate system utilities to bypass signature-based detection. This makes it difficult for traditional rule-based SIEM systems to detect new or mutated threats. Overlooking subtle anomalies, such as new services using specific ports or unusual user access at certain times, leads to delayed responses.

Thirdly, the balance problem of False Positives and False Negatives. Applying rules too strictly increases false positives, where normal activities are mistaken for threats, leading to SOC staff fatigue. Conversely, relaxing rules can result in missing actual threats, potentially causing catastrophic consequences. Frequent false positives, in particular, lead to alert fatigue within the SOC team, causing important warnings to be overlooked.

Fourthly, insider threats and cloud environment complexity. Anomalous data access or privilege misuse by insiders is even harder to detect than external attacks. Furthermore, cloud environments dynamically scale up/down, and new architectures like containers and serverless show complex behavioral patterns that are difficult to predict with traditional security models. In such environments, the ability to detect “behavior out of the ordinary” rather than fixed patterns is essential. A new paradigm was urgently needed to resolve these chronic issues.

Technology Selection Process: Adopting Machine Learning-based Anomaly Detection

To overcome the limitations of existing methods, Company A's SOC team explored various alternative technologies. Initially, they attempted to refine their rule-based system, but this ultimately encountered the limits of human resources and issues with response speed to new threats. Statistical threshold analysis tended to exhibit high false positive rates and low effectiveness.

At this point, a crucial decision was required. They concluded that a technology capable of autonomously learning patterns from massive data without human intervention and detecting anomalies deviating from existing patterns was necessary. This led to Machine Learning (ML)-based anomaly detection emerging as a strong candidate.

ML Model Comparative Analysis

ML models can be broadly categorized into Supervised Learning and Unsupervised Learning. In the threat detection domain, supervised learning models (e.g., classification models) that pre-define and train for specific attack types are useful, but as Company A's primary goal was to find “unknown anomalies,” they focused more on unsupervised learning models.

Below is a comparison table of the main unsupervised learning models considered:

Model Type	Advantages	Disadvantages	Suitable Scenarios
Isolation Forest	Fast learning and detection speed, high outlier detection accuracy, low memory usage	Potential performance degradation with high data dimensionality, weak with high-density clustered data	Anomalous login attempts in large-volume log data, network traffic anomalies
One-Class SVM	Strong modeling for normal data, clear boundary setting for outliers	Long training time, difficult parameter tuning, not suitable for large-volume data	Changes in specific user behavior patterns, anomalous resource usage on specific servers
Autoencoder (Deep Learning)	Capable of learning complex nonlinear patterns, strong with high-dimensional data, easy to apply to time-series data	Requires large amounts of training data, long training time, difficult model interpretation	Cloud resource usage patterns, long-term system metric changes, network session analysis

Company A particularly focused on Isolation Forest and Autoencoder-based time-series analysis models. Isolation Forest was deemed effective for efficient outlier detection from large-volume logs, while Autoencoder was considered suitable for learning anomalous patterns in complex time-series data such as cloud resource usage and network traffic.

Decision-Making Process and Selection Criteria

The main criteria for technology selection were as follows:

Detection Accuracy and False Positive Rate: Can it accurately detect actual threats and minimize false positives?
Scalability: Can it process tens of terabytes of log data and provide near real-time analysis?
Operational Ease: Is it easy to integrate with existing Seekurity SIEM and SOAR systems, and are model management and retraining efficient?
Interpretability: Can SOC personnel understand and respond to the basis of detected anomalies?

Considering these criteria holistically, Company A opted for a two-track strategy: building an initial model based on Isolation Forest and concurrently introducing a time-series analysis model utilizing Autoencoder in the long term. Isolation Forest was advantageous for initial deployment due to its rapid detection and operational ease, while Autoencoder was deemed suitable for the advanced stage of detecting more complex patterns. Overlooking practical applicability and scalability at this stage would lead to delayed responses.

Implementation Process: Building a Practical Machine Learning Pipeline

The implementation of the anomaly detection system proceeded in several stages, from building the data pipeline to model deployment and operation.

Step 1: Data Collection and Preprocessing

First, all of Company A's security log data was aggregated into Seekurity SIEM. Seekurity SIEM demonstrated excellent performance in collecting and normalizing logs from various sources (servers, network devices, cloud resources, endpoints, etc.). Specifically, it was integrated with cloud configuration change and access logs collected via FRIIM CNAPP, expanding the scope of analytical data.

The collected log data was utilized as input for machine learning models after the following preprocessing steps:

Data Normalization: Standardizing data in various formats such as IP addresses, user IDs, and process paths.
Missing Value Handling: Missing data is imputed or removed using appropriate methods (e.g., substitution with mean or mode).
Feature Engineering: Extracting features useful for anomaly detection from raw log data. For example, time-series features such as login attempts, failed attempts, access times, data transfer volume, and command execution frequency per user/IP were generated.
Data Scaling: Applying Min-Max Scaling or Standard Scaling when feature values have a large range, to reduce their impact on model training.

Below is an example of extracting important features from network flow logs (NetFlow):


import pandas as pd
from sklearn.preprocessing import StandardScaler
def preprocess_netflow_logs(df):
    # 時間ベース特徴量生成
    df['timestamp'] = pd.to_datetime(df['timestamp'])
    df['hour_of_day'] = df['timestamp'].dt.hour
    df['day_of_week'] = df['timestamp'].dt.dayofweek
    # ネットワークセッション長計算
    df['session_duration'] = (df['end_time'] - df['start_time']).dt.total_seconds()
    # IPごとの接続数、ポート使用頻度など集計特徴量生成
    df['src_ip_conn_count'] = df.groupby('src_ip')['src_ip'].transform('count')
    df['dst_port_usage_count'] = df.groupby('dst_port')['dst_port'].transform('count')
    # 必要な特徴量選択とスケーリング
    features = ['session_duration', 'src_port', 'dst_port', 'bytes_transferred', 
                'packets_transferred', 'src_ip_conn_count', 'dst_port_usage_count',
                'hour_of_day', 'day_of_week']
    # カテゴリデータを数値に変換（例: One-Hot Encoding）
    # df = pd.get_dummies(df, columns=['protocol'], drop_first=True)
    # 欠損値処理（例: 0で代替）
    df[features] = df[features].fillna(0)
    scaler = StandardScaler()
    df_scaled = scaler.fit_transform(df[features])
    return pd.DataFrame(df_scaled, columns=features)
# 例示的な使用法
# df_raw = pd.read_csv('netflow_logs.csv')
# df_processed = preprocess_netflow_logs(df_raw)

Such feature engineering must be done very carefully as it directly impacts model performance.

Step 2: Model Selection and Training

Based on the preprocessed data, an Isolation Forest model was trained. This model excels at effectively isolating anomalies from large datasets. Training was efficiently conducted leveraging GPU resources.


from sklearn.ensemble import IsolationForest
import numpy as np
def train_anomaly_detection_model(X_train, contamination_rate=0.01):
    # Isolation Forestモデルの初期化と学習
    # contamination: データセット内の異常値の割合（推定値）
    model = IsolationForest(contamination=contamination_rate, random_state=42, n_jobs=-1)
    model.fit(X_train)
    return model
def detect_anomalies(model, X_new, threshold=0.0):
    # 新しいデータに対して異常値スコアを予測
    # score_samplesメソッドは、スコアが低いほど異常値に近いことを示します。
    anomaly_scores = model.decision_function(X_new)
    # しきい値を基準に異常値（anomaly）かどうかを判断
    # IsolationForestは通常 -1 を異常値、 1 を正常と分類します。
    predictions = model.predict(X_new)
    return anomaly_scores, predictions
# 例示的な使用法
# model = train_anomaly_detection_model(df_processed)
# anomaly_scores, predictions = detect_anomalies(model, df_new_processed)
# print("異常予測（1:正常, -1:異常）:")
# print(predictions)

During the training process, the contamination parameter, an estimated ratio of anomalies in the dataset, was initially set to 0.01 (1%) and then continuously adjusted in the actual operational environment. After model training, performance was evaluated using validation data, and parameter tuning was iteratively performed to reduce false positives and missed detections.

Step 3: Deployment and Integration

The trained model was deployed in conjunction with Seekurity SIEM to analyze incoming log data in real-time. T+0: The first anomaly is detected after model deployment. Seekurity SIEM was configured to forward high-risk alerts generated through the machine learning model's analysis results to Seekurity SOAR. Seekurity SOAR executes automated response playbooks based on these alerts.

If a login anomaly is detected: Temporary lock of the affected account, notification to the user, and sending a warning message to administrators.
If anomalous file download/execution is detected: Endpoint isolation (Seekurity XDR integration), forwarding the file to KYRA AI Sandbox for detailed analysis.
If a cloud configuration change anomaly is detected (FRIIM CNAPP integration): Automatic rollback of changes, enhanced monitoring of related account activities.

Such integration provides speed and efficiency across the entire Cyber Kill Chain, from detection-analysis-response. KYRA AI Sandbox specifically performs dynamic analysis on new types of malware and suspicious files, reducing false positives and aiding accurate decision-making. Building an automated response system here is a central decision.

Step 4: Continuous Monitoring and Retraining

Even after the model is deployed, continuous monitoring and feedback loops are indispensable. T+5 minutes: SOC analysts review alerts generated by the ML model in Seekurity SIEM, determine if they are actual threats, and evaluate the model's performance. In case of false positives or missed detections, the relevant data is collected, and the model is periodically retrained to improve its performance.

This process was automated similar to a CI/CD pipeline, enabling rapid adaptation to new threat patterns. Model performance metrics (Precision, Recall, F1-Score) are visualized in real-time on the Seekurity SIEM dashboard, allowing the operations team to monitor them constantly. Without continuous improvement of the model at this point, there will be a disparity in response capabilities.

Results and Achievements: Significant Improvement in SOC Operational Efficiency

Following the introduction of the machine learning-based anomaly detection system, Company A's SOC operations made remarkable progress.

Quantitative Achievements

40% Reduction in False Positive Rate: Compared to the existing rule-based system, the average number of false positive alerts per day was reduced by 40% after ML model implementation. This significantly alleviated alert fatigue among SOC personnel, creating an environment where they could focus on actual threats.
25% Reduction in Missed Detection Rate (Estimated): There was an increase in cases where subtle anomalies, such as insider threats and LotL attacks previously difficult to detect, were identified as actual threats. This contributed to preventing potential damage proactively.
30% Reduction in Mean Time To Detect (MTTD): By the ML model identifying anomalies in real-time and forwarding automated alerts to Seekurity SOAR, the time taken for threat detection was reduced by over 30% on average.

Qualitative Achievements

Improved SOC Staff Work Efficiency: The reduction in false positives meant SOC analysts spent less time analyzing low-priority alerts, allowing them to focus on strategic tasks such as advanced threat analysis and hunting.
Enhanced Proactive Threat Response Capability: With improved detection capabilities for unknown threats, Company A could respond more proactively to threats and minimize potential damage. New malware could be analyzed more quickly through KYRA AI Sandbox, enabling the formulation of response strategies.
Ensured Cloud Security Visibility: Through integration with FRIIM CNAPP, ML-based detection of anomalous access and configuration changes in cloud resources significantly enhanced cloud environment security visibility and control.

Comparison Before and After Implementation

Metric	Before ML Implementation (Rule-based)	After ML Implementation (ML-based)	Improvement Effect
Average False Positives (Daily)	Approx. 500 cases	Approx. 300 cases	40% Reduction
Missed Detection Rate (Estimated)	High	Low	25% Reduction
Average Detection Time (minutes)	20 min	14 min	30% Reduction
SOC Staff Utilization	Focused on false positive analysis	Focused on threat analysis/hunting	Improved efficiency

Lessons Learned and Reflection: Key Factors for Successful Implementation

During the process of implementing the machine learning-based anomaly detection system, several important lessons were learned.

Importance of Data Quality: No matter how good the ML model is, the principle of “Garbage In, Garbage Out” still holds. Ensuring data integrity and quality during the initial data collection and preprocessing stages is a central determinant of model performance. Incomplete or contaminated data hinders model training and leads to false positives.
Human Role Still Crucial: While ML models are powerful tools, the final judgment and complex threat analysis still belong to SOC experts. Human insight plays a decisive role in analyzing anomalies detected by the model, determining if they are actual threats, and improving model performance through feedback loops. It is necessary to build a human-in-the-loop process through Seekurity SIEM/SOAR.
Continuous Retraining and Updates: The threat landscape is constantly evolving, so models must be continuously retrained and updated to adapt. A static model will inevitably degrade in performance over time. Building an automated retraining pipeline is key.
Phased Implementation Strategy: Rather than attempting to build a perfect model from the outset, a strategy of starting small with core data sources and gradually expanding is effective. This reduces trial and error and helps build a stable system.

An unexpected side effect was the ability to discover blind spots in existing security rules through the ML model's detection results and improve them, thereby enhancing the overall completeness of security policies. By incorporating the latest threat intelligence analyzed through KYRA AI Sandbox into the model's training data, a virtuous cycle emerged that strengthened the overall security ecosystem.

Application Guide: Machine Learning-based Anomaly Detection Roadmap

Here are practical guidelines for organizations looking to implement a machine learning-based anomaly detection system in similar environments:

Step 1: Goal Setting and Data Preparation

Clearly define the security problems you most want to solve (e.g., specific types of missed threats, high false positive rates).
Leverage Seekurity SIEM to aggregate and normalize necessary log data. Especially, if the goal is to detect anomalous access patterns in specific cloud environments, integrating and collecting relevant logs through FRIIM CNAPP is an essential prerequisite.
Ensure sufficient quantity and quality of data required for model training. It is crucial to build a dataset that includes past threat incidents.

Step 2: Initial Model Building and Validation

It is effective to start with an unsupervised learning model like Isolation Forest, which is relatively easy to implement and has proven performance.
Train the model based on the prepared data and validate its performance by running false positive and missed detection tests under conditions similar to the real environment.
In the initial phase, focus on having SOC personnel manually review detected anomalies to increase model reliability. We recommend leveraging KYRA AI Sandbox to perform detailed analysis on suspicious files or processes and use this as model feedback data.

Step 3: System Integration and Automation

Integrate the validated model into Seekurity SIEM and deploy it to detect anomalies in real time.
Build Seekurity SOAR playbooks for automated responses to detected anomalies. This will significantly improve the speed of initial responses such as blocking suspicious IPs, locking user accounts, and isolating endpoints.
Automate the model retraining and deployment process to build a CI/CD pipeline that can quickly adapt to changes in the threat environment.

This phased implementation roadmap contributes to minimizing initial investment costs and risks while incrementally advancing SOC detection and response capabilities. Specifically, the successful implementation of machine learning models goes beyond mere technical improvement; it creates a difference in response capabilities that actively addresses changes in the threat environment. To counter continuously evolving cyber threats, a machine learning-based anomaly detection system must be established proactively.