Skip to main content
All CollectionsBuild AutomationsOperators
Deduplicate Operator: Identify Unique Values
Deduplicate Operator: Identify Unique Values

Use the Deduplicate operator to identify unique inputs and minimize noise by managing unique and duplicate values differently.

Updated over a month ago

The Deduplicate operator determines whether an expression's value is unique within a specified time window across executions of the same workflow. This enables you to handle unique and duplicate values in distinct ways. Leverage the Deduplicate operator to reduce noise, efficiently handle recurring events or previously encountered scenarios, and eliminate the effort required to implement complex manual deduplication logic.

Key applications of the Deduplicate operator include (see use case examples):

  • Filter Duplicate Trigger Events: Handle alert bursts and group related events, such as repeated failed login attempts.

  • Deduplication Based on Enrichment Results: Identify duplicates after enrichment or by taking into account additional contextual data—for example, determining if a phishing alert is unique after processing the email attachments and content.

  • Suppress Notifications or Actions: Avoid excessive notifications by only contacting users for unique events. Similarly, enrich IOCs only once within a time window if they are unique, retrieving the cached enrichment results for duplicates.

How to Use

  1. Add the Operator: Drag the Deduplicate operator onto the canvas.

  2. Input Field: Specify the expression from the workflow context that you want to check for uniqueness.

    • To define multiple expressions, click Add Expression. These expressions are evaluated using an AND condition, meaning all specified values must be unique.

    • If the input expression cannot be evaluated (e.g., the object lacks the specified property), the event will default to the Unique branch.

  3. Time Range: Define how far back the operator should check for matching values during evaluation.

    • The maximum time range is 31 days.

    • After the specified period, the uniqueness count resets.

  4. Number of Executions: Define how many occurrences of an input will be considered unique.

    • For example, setting this to 2 means the first two identical inputs are treated as unique, while the third is marked as a duplicate.

    • The default is 1, and the maximum allowed is 1000.

  5. Branches: Unique and duplicate inputs are handled in separate branches.

    • Inputs identified as Unique are processed by steps placed in the Unique branch.

    • Inputs identified as Duplicate are processed by steps in the Duplicate branch.

Basic Example

Configure the operator with the following parameters:

  • Input: $.event.ip_address

  • Time range: 2 days.

  • Number of executions: 3.

Workflow Behavior:

  1. First Day:

    • Two events with the IP address 147.32.192.50 are received.

    • Both are processed through the Unique branch.

  2. Second Day:

    • The first event with 147.32.192.50 passes through the Unique branch.

    • Subsequent events with the same IP address are sent to the Duplicate branch.

  3. Third Day:

    • The count resets and 147.32.192.50 is allowed to pass through the Unique branch up to three times within the next 2-day period.

Use Case Examples

The following examples demonstrate key use cases for the Deduplicate operator.

Filter Duplicate Trigger Events

Use the Deduplicate operator to create tickets or cases only for unique trigger events, reducing noise and improving efficiency in incident tracking.

Example: Managing Failed Login Attempts

  1. A failed login attempt creates a case only if the combination of client ID and IP address is unique within a 30-minute timeframe.

  2. During this time, additional login failures with the same client ID and IP address are routed to the Duplicate branch.

  3. These duplicates are attached to the existing case, preventing the creation of unnecessary tickets while ensuring recurring issues are properly logged and managed.

Deduplication Based on Enrichment Results

Perform deduplication based on the workflow context: enrichment results, additional retrieved information, etc.

Example: Evaluating Phishing Emails

  1. The workflow is triggered by a user-reported phishing email.

  2. Additional information is parsed from the email attachments and email body.

  3. The parsed data is provided as input to the Deduplicate operator to evaluate whether the event is unique.

Suppress Actions for a Certain Period

Use the Deduplicate operator to limit actions, such as user notifications, to a single execution within a defined time period, minimizing unnecessary repetition.

Example: Managing Password Expiration Notifications

  1. The workflow is triggered hourly for users with passwords about to expire.

  2. The Deduplicate operator ensures each user receives a notification only once within a 24-hour window.

  3. The first event within the time window triggers a notification to the user.

  4. Subsequent events for the same user during the window are dropped, avoiding redundant notifications.

Deduplication in a Nested Workflow

Use the Deduplicate operator in a nested workflow to track unique values across multiple parent workflows by using the same nested workflow for deduplication in all of them.

Example: IP Address Enrichment

  1. The workflow receives an IP address as input.

  2. The Deduplicate operator checks if the IP is unique within the previous 24-hour window.

    • If unique, the workflow enriches the IP address.

    • If a duplicate, the workflow retrieves the cached enrichment results.

  3. This workflow can serve as a nested workflow, ensuring IPs are enriched only once per day across all parent workflows, while duplicates reuse the existing results.

Did this answer your question?