Skip to main content
All CollectionsBuild AutomationsIntegrations
Deploy and Manage Self-Hosted Step Runners in Torq: A Comprehensive Guide
Deploy and Manage Self-Hosted Step Runners in Torq: A Comprehensive Guide

Learn how to set up, deploy, and manage Torq's self-hosted step runners for secure workflow execution in private networks.

Updated over a week ago

Integrating a self-hosted step runner within Torq provides a powerful solution for executing workflow steps that require access to components located in private VPCs or on-premises data centers. This guide outlines the process of configuring, deploying, and managing self-hosted step runners, ensuring secure and efficient operation within your organization's infrastructure.

Configuring a Self-Hosted Step Runner

  1. Navigate to Integrations: Go to Integrations > Runner in Torq, select Step Runner, and click Add.

  2. Naming Your Step Runner: Enter a descriptive name that reflects the runner's type (Kubernetes or Docker) and its deployment environment.

  3. Select Deployment Type: Choose between Kubernetes or Docker-based on your infrastructure.

  4. Installation Command: Click Add, and Torq will generate a YAML deployment configuration file and an associated install command.

    The runner install command is valid for 24 hours.

Gif of Torq's Integration page, clicking through to the step runner

Deploying the Step Runner

  • Execute the Install Command: Run the provided install command in your terminal. This command deploys the step runner according to the generated configuration.

Regenerating an Install Command

If you need to reinstall an unhealthy runner or deploy an additional instance:

  1. Go back to Integrations > Runner, and select your desired Step Runner.

  2. Use the three dots menu to select the Regenerate install command.

  3. Choose the deployment platform (Docker/Kubernetes), copy the new install command, and execute it within 24 hours.

Regenerate the runner install command

Deployment Options

After you finish defining a new remote step runner, Torq automatically generates a deployment configuration file in YAML format (the file is downloaded automatically) paired with deployment instructions.

Deploy using Docker

To deploy a Docker runner, run the install command in a terminal. When done, the runner should be ready to use.

Advanced Options

To customize the runner deployment instructions (which shouldn't be required for most use cases), add flags to the automatically generated deployment configuration file or change the default values.

  1. Retrieve the content of the automatically generated file. For example, for the following install command

    curl -H "Content-Type: application/x-sh" -s -L "https://link.torq.io/***z5v5qBRg21otM8" | sh, run: curl -H "Content-Type: application/x-sh" -s -L "https://link.torq.io/***z5v5qBRg21otM8"
  2. Copy the command output and paste it into a fresh line.

  3. Add flags or change values according to your needs. These are two examples:

    • Specify a proxy server:

      -e https_proxy=http://XXXXXXX:PORT

      . Make sure you replace XXXXXXX:PORT with your actual proxy address and port.

    • Connect to a bridge network: -e DOCKER_NETWORK_ID='XXXXXXXXXXX' -e DOCKER_NETWORK_NAME='XXXXXXXXXX'. A bridge network uses a software bridge that allows containers connected to the same bridge network to communicate while providing isolation from containers that are not connected to that bridge network.

  4. Run the edited deployment configuration file.

  5. Run docker ps to confirm that the service is running.

  6. To retrieve the edited deployment configuration file at any time, use the command: docker inspect --format "$(curl -Ls https://link.torq.io/7Yoh)" $(docker ps | grep spd | awk '{print $1}') The retrieved configuration file can be used to deploy the runner on a new machine.

Deploy using Kubernetes

Allow Kubernetes cluster management actions

If you select the Allow Kubernetes cluster management actions option, workflow steps executed using this runner can take actions on the Kubernetes cluster. The auto-generated Kubernetes resources will include a ClusterRole resource allowing Get and List access to resources, such as Pods, Pod Logs, Events, Nodes, Configmaps, Deployments, and Horizontal Pod Autoscalers. Another automatically generated ClusterRoleBinding object will automatically bind this role to all steps executed by the runner.

If you don't select this option, the steps cannot perform any actions on the Kubernetes cluster they are running on. This may not prevent them from accessing external services. If the operations listed above are insufficient, and workflow steps are expected to perform additional operations on the cluster, the ClusterRole can be modified and re-applied to the cluster. Alternatively, you can create a dedicated Kubernetes integration to manage specific permissions for specific steps.

Below is a default configuration for the ClusterRole and ClusterRoleBinding objects that are automatically added to the speed runner deployment when the Allow Kubernetes cluster management actions option is selected:

apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata:   name: torq-step rules: - apiGroups:   - ""   resources:   - pods   - pods/log   - events   - nodes   - configmaps   verbs:   - get   - list - apiGroups:   - apps/v1   resources:   - deployments   verbs:   - get   - list - apiGroups:   - autoscaling   resources:   - horizontalpodautoscalers   - horizontalpodautoscalers/status   verbs:   - '*' - apiGroups:   - metrics.k8s.io   resources:   - pods   - nodes   verbs:   - get   - list --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata:   name: torq-step roleRef:   apiGroup: rbac.authorization.k8s.io   kind: ClusterRole   name: torq-step subjects: - kind: ServiceAccount   name: default   namespace: torq

Use an Existing Kubernetes Clusters

The configuration YAML file contains resource definitions for all the resources required to run step runner pods and execute steps. To apply the configuration to the cluster, run kubectl apply -f

All Torq resources are created inside dedicated Kubernetes namespaces torq for the Torq agent itself and for jobs of various steps), allowing for simple monitoring and removal by issuing a kubectl delete namespace command.

Use AWS Elastic Kubernetes Service (EKS)

For Amazon Web Services users not using Kubernetes Cluster, the easiest way to deploy a self-hosted step runner is to establish a managed Kubernetes Cluster using AWS Elastic Kubernetes Service. Creation and management of the cluster can be achieved using several simple steps defined below. Hosting a Torq step runner on the cluster is a straightforward step.

1. Install AWS CLI, eksctl, kubectl

To simplify creating an EKS Cluster, it is recommended to use the AWS ClI and eksctl command-line tools provided by Amazon Web Services. To control the Kubernetes cluster, you should install and configure a kubectl command-line tool.

This step-by-step guide from Amazon Web Services provides all the information necessary to download and configure the tools.

In order to configure the AWS command-line utility, a set of AWS Credentials is required. This guide from AWS explains how to configure the command-line utility.

Generally, executing AWS configuration and providing the authentication credentials will prepare the system for operation:

aws configure AWS Access Key ID [********************]: AWS Secret Access Key [********************]: Default region name [us-east-2]: Default output format [None]:

2. Create an EKS Cluster

Eksctl utility is the simplest way to create an EKS cluster. The utility can be used in two modes:

The imperative mode receives all the information about the cluster in the command-line arguments, similar to the following example.

eksctl create cluster   --name torq-steps   --version 1.17   --region us-east-2   --nodegroup-name linux-nodes   --node-type t3.small   --nodes 2   --nodes-min 1   --nodes-max 2   --ssh-access   --ssh-public-key my-public-key.pub   --managed

The infrastructure-as-code mode uses all cluster definitions defined in a YAML file provided to the utility execution. Below is a sample YAML configuration file for the eksctl utility:

apiVersion: eksctl.io/v1alpha5  kind: ClusterConfigmetadata:    name: torq-steps    region: us-east-2nodeGroups:    - name: torq-steps-ng-1      instanceType: t3.small      desiredCapacity: 2

To apply this configuration, call: eksctl create cluster -f

Torq requires a minimum of 2 vCPUs and 2GB RAM for the nodes in a Kubernetes cluster to ensure the ability to execute steps during a workflow run.

One of the benefits of using eksctl utility is that it simplifies the later usage of the Kubernetes cluster by creating a kubeconfig file (later to be used by the kubectl utility). EKS Cluster(s) can be created using other means, such as but not limited to, AWS Console or a dedicated AWS EKS Terraform Module, but then, the kubeconfig creation will become the responsibility of the user.

Running eksctl will update the kubeconfig file (by default a file named config and located under $HOME/.kube with the details of the newly created cluster and will automatically change its context.

3. Deploy the Step Runner

After the EKS cluster is created, you deploy the step runner using the same procedure as in any other Kubernetes Cluster.


Use K3S on any server

The fastest and easiest way to deploy a production-grade Kubernetes cluster (for small tasks) on any virtual (or physical) machine is by using a K3S Kubernetes distribution. K3S, originally developed by Rancher is a CNCF Sandbox Project that can be deployed on any x85_64, ARMv7 or ARM64 server within ~30 seconds (according to the developer website).

Step-by-step deployment instructions are available on the K3S Site. After deploying K3S cluster, you can deploy the step runner using the same procedure as in any other Kubernetes Cluster.

Audit and Troubleshoot Self-Hosted Runners

While a step runner is a simple component that doesn't require any special configuration or treatment, being able to audit its operations and, when required, troubleshoot its activity can help resolve challenging situations. The below commands suggest how to get an insight into an activity performed by a step runner.

Step execution events

All steps are executed in the Kubernetes namespace called "torq" (Kubernetes is assumed as a default step runner adapter).

Using kubectl to get the list of events that took place in the namespace using kubectl get events --namespace=torq

The output should consist of the following types of events:

  • Pulled: Container image for a specific step was pulled from the container registry

  • Created: Container, based on the pulled image, was created in preparation to execute the step

  • Started: Step container execution started

Additional events can indicate longer processes, such as Pulling, Scheduled, and others.

Find step execution jobs

When workflows are running, steps are initiated by the step runner as jobs in the torq Kubernetes namespace. To view the currently-running jobs, use kubectl get jobs --namespace=torq

Pull step runner logs

The step runner is a Kubernetes Pod running in the "torq" namespace. In order to retrieve its logs, first one should find the Pod name by issuing kubectl get pods --namespace=torq

Then, using the Pod name retrieved, to see the detailed logs using kubectl logs --namespace=torq

Locate Runner

If you want to find a runner's IP address, copy and paste this cURL command into an empty spot in your workflow canvas and select to run in from your runner

curl 'https://api.ipify.org?format=json' {"ip":"92.178.82.94"}

URLs the runner uses to communicate with Torq

Step runners use the following URLs to communicate with Torq.

URL

Purpose

Used to pull the runner image

Used to pull configurations (one time)

Used to communicate with the Torq service

Used to upload logs

Used for authentication

For a full list of IP addresses that Google uses, see the following links:

Use an External Secrets Manager

Torq provides the Custom Secrets integration, a secure way of managing secrets, such as credentials, keys, etc., that you can use in workflow steps. This is a convenient way to store sensitive data (without it being exposed in the UI) and to be able to reuse it in workflows running across different environments.

In some cases, when executing specific steps inside "closed" environments, you might need to store secrets used by specific steps outside of the Torq environment. Torq steps can implement fetching secrets locally from external secret management solutions, such as (but not limited to) AWS Secrets Manager, Google Cloud Secret Manager, or Azure Key Vault. Integration steps support these solutions where applicable.

The mechanism for using an external secret manager is described below. We will use an example of an ssh-command step to demonstrate the differences between having the SSH Certificate stored in the Custom Integration in Torq, AWS Secrets Manager, and Google Cloud Secret Manager.

In order for the step to retrieve a secret from an external system, the following tasks should be done ahead of time:

  • A secret (in our example - SSH certificate) should be stored in the external system.

  • If AWS Secrets Manager is used, the following aws-cli command can be used:

aws secretsmanager create-secret --name        --description ""       --secret-string `

If Google Cloud Secret Manager is used, the following gcloud cli command can be used:

gcloud secrets create    --data-file=""   --replication-policy=automatic
  1. Define a Service Account (GCP) or a User with Programmatic Access (AWS) and provide these with the relevant permissions to access secrets. AWS provides a built-in arn:aws:iam::aws:policy/SecretsManagerReadWrite policy, however, it allows excessive permissions. It is recommended to construct a dedicated policy containing Secrets Manager : DescribeSecret and Secrets Manager : GetSecretValue permissions. Similarly, in GCP the service account would need secretmanager.secrets.get and secretmanager.secrets.access permissions.

  2. Store the Service Account credentials in the Custom Secrets integration and define an access policy to the secret objects. By doing so, the service accounts will be (in the following steps) provided the steps in the workflows that need to retrieve the actual secrets from the external secret management system, and the IAM policy can restrict access for specific environments (ensuring that, for example, only the jobs running in specific locations can get access).

  3. In the workflow, make sure that the service account data is being passed to the relevant steps. (This will allow the steps to assume the role and pull the actual secret, providing it is allowed by the IAM policy).

  4. Use (or implement) steps that support retrieving secrets from the relevant secret managers. Below example demonstrates ssh-command step in different flavors, using AWS Secrets Manager or Google Cloud Secret Manager to retrieve the actual SSH Certificate for the connection:

- id: run_ssh_command_on_remote_host     name: us-docker.pkg.dev/torq/public/ssh-command-aws-secret:rc0.7     runner: aws-env     env:        SSH_CLIENT: "##{{ .UserName }}@##{{ .ServerAddress }"       SSH_COMMAND: |         uname -a;         df -k;         ls -l ~/       AWS_ACCESS_KEY_ID: '##{{ secret "AWS_SECRET_MANAGER_ROLE_KEY" }}'       AWS_SECRET_ACCESS_KEY: '##{{ secret "AWS_SECRET_MANAGER_ROLE_SECRET" }}'       SSH_KEY_SECRET_NAME: 'MySSHKey'     output_parser:       name: us-docker.pkg.dev/torq/public/raw-output-parser

As shown in the example, the step expects to receive credentials for the role that'd allow it to retrieve the actual SSH Certificate, stored in a secret named MySSHKey in the AWS Secrets Manager. Naturally, this is just an example, and the secret can be anything, not just an SSH Certificate.

The credentials stored in AWS_SECRET_MANAGER_ROLE_KEY and AWS_SECRET_MANAGER_ROLE_SECRET are the ones for the user account with programmatic access, as defined in step #2 above.

- id: run_ssh_command_on_remote_host      name: us-docker.pkg.dev/torq/public/ssh-command-gcloud-secret:rc0.7     runner: gcp-env     env:        SSH_CLIENT: "##{{ .UserName }}@##{{ .ServerAddress }"       SSH_COMMAND: |         uname -a;         df -k;         ls -l ~/             AUTH_CODE: '##{{ secret "GCP_SECRET_MANAGER_ACCOUNT_TOKEN"}}'       SSH_KEY_SECRET_NAME: 'MySSHKey'     output_parser:       name: us-docker.pkg.dev/torq/public/raw-output-parser

In this example, the step expects to receive a base64-encoded version of the authentication token for a service account in AUTH_CODE environment variable. Then, assuming the role of this service account, the actual SSH Certificate will be retrieved from a secret named MySSHKey.

Did this answer your question?