Integrating a self-hosted step runner within Torq provides a powerful solution for executing workflow steps that require access to components located in private VPCs or on-premises data centers. This guide outlines the process of configuring, deploying, and managing self-hosted step runners, ensuring secure and efficient operation within your organization's infrastructure.
Before deploying a self-hosted step runner, confirm that the machine has sufficient memory and that its time configuration is correct.
Configuring a Self-Hosted Step Runner
Navigate to Integrations: Go to Integrations > Runner in Torq, select Step Runner, and click Add.
Naming Your Step Runner: Enter a descriptive name that reflects the runner's type (Kubernetes or Docker) and its deployment environment.
Select Deployment Type: Choose between Kubernetes or Docker-based on your infrastructure.
Installation Command: Click Add, and Torq will generate a YAML deployment configuration file and an associated install command.
The runner install command is valid for 24 hours.
Deploying the Step Runner
Execute the Install Command: Run the provided install command in your terminal. This command deploys the step runner according to the generated configuration.
Regenerating an Install Command
If you need to reinstall an unhealthy runner or deploy an additional instance:
Go back to Integrations > Runner, and select your desired Step Runner.
Use the three dots menu to select the Regenerate install command.
Choose the deployment platform (Docker/Kubernetes), copy the new install command, and execute it within 24 hours.
Deployment Options
After you finish defining a new remote step runner, Torq automatically generates a deployment configuration file in YAML format (the file is downloaded automatically) paired with deployment instructions.
Deploy using Docker
To deploy a Docker runner, run the install command in a terminal. When done, the runner should be ready to use.
Advanced Options
To customize the runner deployment instructions (which shouldn't be required for most use cases), add flags to the automatically generated deployment configuration file or change the default values.
Retrieve the content of the automatically generated file. For example, for the following install command
curl -H "Content-Type: application/x-sh" -s -L "https://link.torq.io/***z5v5qBRg21otM8" | sh, run: curl -H "Content-Type: application/x-sh" -s -L "https://link.torq.io/***z5v5qBRg21otM8"
Copy the command output and paste it into a fresh line.
Add flags or change values according to your needs. These are two examples:
Specify a proxy server:
-e https_proxy=http://XXXXXXX:PORT
. Make sure you replace
XXXXXXX:PORT
with your actual proxy address and port.Connect to a bridge network:
-e DOCKER_NETWORK_ID='XXXXXXXXXXX' -e DOCKER_NETWORK_NAME='XXXXXXXXXX'
. A bridge network uses a software bridge that allows containers connected to the same bridge network to communicate while providing isolation from containers that are not connected to that bridge network.
Run the edited deployment configuration file.
Run
docker ps
to confirm that the service is running.To retrieve the edited deployment configuration file at any time, use the command:
docker inspect --format "$(curl -Ls https://link.torq.io/7Yoh)" $(docker ps | grep spd | awk '{print $1}')
The retrieved configuration file can be used to deploy the runner on a new machine.
Deploy using Kubernetes
Allow Kubernetes cluster management actions
If you select the Allow Kubernetes cluster management actions option, workflow steps executed using this runner can take actions on the Kubernetes cluster. The auto-generated Kubernetes resources will include a ClusterRole resource allowing Get and List access to resources, such as Pods, Pod Logs, Events, Nodes, Configmaps, Deployments, and Horizontal Pod Autoscalers. Another automatically generated ClusterRoleBinding object will automatically bind this role to all steps executed by the runner.
If you don't select this option, the steps cannot perform any actions on the Kubernetes cluster they are running on. This may not prevent them from accessing external services. If the operations listed above are insufficient, and workflow steps are expected to perform additional operations on the cluster, the ClusterRole can be modified and re-applied to the cluster. Alternatively, you can create a dedicated Kubernetes integration to manage specific permissions for specific steps.
Below is a default configuration for the ClusterRole and ClusterRoleBinding objects that are automatically added to the speed runner deployment when the Allow Kubernetes cluster management actions option is selected:
apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: torq-step rules: - apiGroups: - "" resources: - pods - pods/log - events - nodes - configmaps verbs: - get - list - apiGroups: - apps/v1 resources: - deployments verbs: - get - list - apiGroups: - autoscaling resources: - horizontalpodautoscalers - horizontalpodautoscalers/status verbs: - '*' - apiGroups: - metrics.k8s.io resources: - pods - nodes verbs: - get - list --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: torq-step roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: torq-step subjects: - kind: ServiceAccount name: default namespace: torq
Use an Existing Kubernetes Clusters
Use an Existing Kubernetes Clusters
The configuration YAML file contains resource definitions for all the resources required to run step runner pods and execute steps. To apply the configuration to the cluster, run kubectl apply -f
All Torq resources are created inside dedicated Kubernetes namespaces torq for the Torq agent itself and for jobs of various steps), allowing for simple monitoring and removal by issuing a kubectl delete namespace
command.
Use AWS Elastic Kubernetes Service (EKS)
Use AWS Elastic Kubernetes Service (EKS)
For Amazon Web Services users not using Kubernetes Cluster, the easiest way to deploy a self-hosted step runner is to establish a managed Kubernetes Cluster using AWS Elastic Kubernetes Service. Creation and management of the cluster can be achieved using several simple steps defined below. Hosting a Torq step runner on the cluster is a straightforward step.
1. Install AWS CLI, eksctl, kubectl
To simplify creating an EKS Cluster, it is recommended to use the AWS ClI and eksctl command-line tools provided by Amazon Web Services. To control the Kubernetes cluster, you should install and configure a kubectl command-line tool.
This step-by-step guide from Amazon Web Services provides all the information necessary to download and configure the tools.
In order to configure the AWS command-line utility, a set of AWS Credentials is required. This guide from AWS explains how to configure the command-line utility.
Generally, executing AWS configuration and providing the authentication credentials will prepare the system for operation:
aws configure AWS Access Key ID [********************]: AWS Secret Access Key [********************]: Default region name [us-east-2]: Default output format [None]:
2. Create an EKS Cluster
Eksctl utility is the simplest way to create an EKS cluster. The utility can be used in two modes:
The imperative mode receives all the information about the cluster in the command-line arguments, similar to the following example.
eksctl create cluster --name torq-steps --version 1.17 --region us-east-2 --nodegroup-name linux-nodes --node-type t3.small --nodes 2 --nodes-min 1 --nodes-max 2 --ssh-access --ssh-public-key my-public-key.pub --managed
The infrastructure-as-code mode uses all cluster definitions defined in a YAML file provided to the utility execution. Below is a sample YAML configuration file for the eksctl utility:
apiVersion: eksctl.io/v1alpha5 kind: ClusterConfigmetadata: name: torq-steps region: us-east-2nodeGroups: - name: torq-steps-ng-1 instanceType: t3.small desiredCapacity: 2
To apply this configuration, call: eksctl create cluster -f
Torq requires a minimum of 2 vCPUs and 2GB RAM for the nodes in a Kubernetes cluster to ensure the ability to execute steps during a workflow run.
One of the benefits of using eksctl utility is that it simplifies the later usage of the Kubernetes cluster by creating a kubeconfig file (later to be used by the kubectl utility). EKS Cluster(s) can be created using other means, such as but not limited to, AWS Console or a dedicated AWS EKS Terraform Module, but then, the kubeconfig creation will become the responsibility of the user.
Running eksctl will update the kubeconfig file (by default a file named config and located under $HOME/.kube
with the details of the newly created cluster and will automatically change its context.
3. Deploy the Step Runner
After the EKS cluster is created, you deploy the step runner using the same procedure as in any other Kubernetes Cluster.
Use K3S on any server
The fastest and easiest way to deploy a production-grade Kubernetes cluster (for small tasks) on any virtual (or physical) machine is by using a K3S Kubernetes distribution. K3S, originally developed by Rancher is a CNCF Sandbox Project that can be deployed on any x85_64, ARMv7 or ARM64 server within ~30 seconds (according to the developer website).
Step-by-step deployment instructions are available on the K3S Site. After deploying K3S cluster, you can deploy the step runner using the same procedure as in any other Kubernetes Cluster.
Audit and Troubleshoot Self-Hosted Runners
Audit and Troubleshoot Self-Hosted Runners
While a step runner is a simple component that doesn't require any special configuration or treatment, being able to audit its operations and, when required, troubleshoot its activity can help resolve challenging situations. The below commands suggest how to get an insight into an activity performed by a step runner.
Step execution events
All steps are executed in the Kubernetes namespace called "torq" (Kubernetes is assumed as a default step runner adapter).
Using kubectl to get the list of events that took place in the namespace using kubectl get events --namespace=torq
The output should consist of the following types of events:
Pulled: Container image for a specific step was pulled from the container registry
Created: Container, based on the pulled image, was created in preparation to execute the step
Started: Step container execution started
Additional events can indicate longer processes, such as Pulling, Scheduled, and others.
Find step execution jobs
When workflows are running, steps are initiated by the step runner as jobs in the torq Kubernetes namespace. To view the currently-running jobs, use kubectl get jobs --namespace=torq
Pull step runner logs
The step runner is a Kubernetes Pod running in the "torq" namespace. In order to retrieve its logs, first one should find the Pod name by issuing kubectl get pods --namespace=torq
Then, using the Pod name retrieved, to see the detailed logs using kubectl logs --namespace=torq
Locate Runner
If you want to find a runner's IP address, copy and paste this cURL command into an empty spot in your workflow canvas and select to run in from your runner
curl 'https://api.ipify.org?format=json' {"ip":"92.178.82.94"}
URLs the runner uses to communicate with Torq
Step runners use the following URLs to communicate with Torq.
URL | Purpose |
Used to pull the runner image | |
Used to pull configurations (one time) | |
Used to communicate with the Torq service | |
Used to upload logs | |
Used for authentication |
For a full list of IP addresses that Google uses, see the following links:
Use an External Secrets Manager
Use an External Secrets Manager
Torq provides the Custom Secrets integration, a secure way of managing secrets, such as credentials, keys, etc., that you can use in workflow steps. This is a convenient way to store sensitive data (without it being exposed in the UI) and to be able to reuse it in workflows running across different environments.
In some cases, when executing specific steps inside "closed" environments, you might need to store secrets used by specific steps outside of the Torq environment. Torq steps can implement fetching secrets locally from external secret management solutions, such as (but not limited to) AWS Secrets Manager, Google Cloud Secret Manager, or Azure Key Vault. Integration steps support these solutions where applicable.
The mechanism for using an external secret manager is described below. We will use an example of an ssh-command step to demonstrate the differences between having the SSH Certificate stored in the Custom Integration in Torq, AWS Secrets Manager, and Google Cloud Secret Manager.
In order for the step to retrieve a secret from an external system, the following tasks should be done ahead of time:
A secret (in our example - SSH certificate) should be stored in the external system.
If AWS Secrets Manager is used, the following aws-cli command can be used:
aws secretsmanager create-secret --name --description "" --secret-string `
If Google Cloud Secret Manager is used, the following gcloud cli command can be used:
gcloud secrets create --data-file="" --replication-policy=automatic
Define a Service Account (GCP) or a User with Programmatic Access (AWS) and provide these with the relevant permissions to access secrets. AWS provides a built-in
arn:aws:iam::aws:policy/SecretsManagerReadWrite
policy, however, it allows excessive permissions. It is recommended to construct a dedicated policy containingSecrets Manager : DescribeSecret
andSecrets Manager : GetSecretValue
permissions. Similarly, in GCP the service account would needsecretmanager.secrets.get
andsecretmanager.secrets.access
permissions.Store the Service Account credentials in the Custom Secrets integration and define an access policy to the secret objects. By doing so, the service accounts will be (in the following steps) provided the steps in the workflows that need to retrieve the actual secrets from the external secret management system, and the IAM policy can restrict access for specific environments (ensuring that, for example, only the jobs running in specific locations can get access).
In the workflow, make sure that the service account data is being passed to the relevant steps. (This will allow the steps to assume the role and pull the actual secret, providing it is allowed by the IAM policy).
Use (or implement) steps that support retrieving secrets from the relevant secret managers. Below example demonstrates
ssh-command
step in different flavors, using AWS Secrets Manager or Google Cloud Secret Manager to retrieve the actual SSH Certificate for the connection:
- id: run_ssh_command_on_remote_host name: us-docker.pkg.dev/torq/public/ssh-command-aws-secret:rc0.7 runner: aws-env env: SSH_CLIENT: "##{{ .UserName }}@##{{ .ServerAddress }" SSH_COMMAND: | uname -a; df -k; ls -l ~/ AWS_ACCESS_KEY_ID: '##{{ secret "AWS_SECRET_MANAGER_ROLE_KEY" }}' AWS_SECRET_ACCESS_KEY: '##{{ secret "AWS_SECRET_MANAGER_ROLE_SECRET" }}' SSH_KEY_SECRET_NAME: 'MySSHKey' output_parser: name: us-docker.pkg.dev/torq/public/raw-output-parser
As shown in the example, the step expects to receive credentials for the role that'd allow it to retrieve the actual SSH Certificate, stored in a secret named MySSHKey
in the AWS Secrets Manager. Naturally, this is just an example, and the secret can be anything, not just an SSH Certificate.
The credentials stored in AWS_SECRET_MANAGER_ROLE_KEY and AWS_SECRET_MANAGER_ROLE_SECRET are the ones for the user account with programmatic access, as defined in step #2 above.
- id: run_ssh_command_on_remote_host name: us-docker.pkg.dev/torq/public/ssh-command-gcloud-secret:rc0.7 runner: gcp-env env: SSH_CLIENT: "##{{ .UserName }}@##{{ .ServerAddress }" SSH_COMMAND: | uname -a; df -k; ls -l ~/ AUTH_CODE: '##{{ secret "GCP_SECRET_MANAGER_ACCOUNT_TOKEN"}}' SSH_KEY_SECRET_NAME: 'MySSHKey' output_parser: name: us-docker.pkg.dev/torq/public/raw-output-parser
In this example, the step expects to receive a base64-encoded version of the authentication token for a service account in AUTH_CODE environment variable. Then, assuming the role of this service account, the actual SSH Certificate will be retrieved from a secret named MySSHKey.