Convox allows you to scale your application’s concurrency, memory allocation, and the resources available in the underlying Rack.
Setting initial defaults
You can specify initial default values for a service in the convox.yml:
services: web: scale: count: 4 cpu: 512 memory: 1024
These will only be used upon first deployment. Subsequent changes will not be actioned as scaling should be accomplished through the CLI, or through setting up autoscaling configuration.
Scaling an application
$ convox scale NAME DESIRED RUNNING MEMORY web 2 1 256 redis 1 1 256
Changing the number of processes for a service
$ convox scale web --count=4 NAME DESIRED RUNNING MEMORY web 2 1 256
Setting new values for CPU or Memory
$ convox scale web --memory=1024 --cpu=512 NAME DESIRED RUNNING MEMORY web 2 1 1024
Each rack instance has 1024 cpu units for every CPU core. This parameter specifies the minimum amount of CPU to reserve for a container. Containers share unallocated CPU units with other containers on the instance with the same ratio as their allocated amount.
There are two dimension for scale on a Convox Rack:
- The number of instances (servers) running that provide capacity to launch containers
- The number of processes (containers) running for each service
Convox can autoscale in both of these dimensions.
Rack-level autoscaling is enabled by default when you install a new Rack. When Rack-level autoscaling is enabled, the number of instances currently running will continually adjust based on the current container workload.
This level of autoscaling can be adjusted with the Rack parameter
$ convox rack params set Autoscale=Yes
Service-level autoscaling is controlled in the
service: web: scale: count: 1-10 targets: cpu: 70 memory: 90 requests: 200
Setting scale targets for a service will cause the service-level autoscaler to adjust the number of running processes for a particular service to try to meet the targets you define.
cpu: Average CPU utilization (%) for all processes
memory: Average Memory utilization (%) for all processes
requests:Requests per minute per process
You can also use custom Cloudwatch metrics as a target for the service autoscaler:
service: worker: scale: count: 1-10 targets: custom: AWS/SQS/ApproximateNumberOfMessagesVisible: aggregate: max value: 20 dimensions: QueueName: myqueue
These settings would continually adjust the level of workers to keep the maximum number of messages waiting in the queue to 20.
You can define any or all of these targets for each service. The autoscaler will select the maximum number of processes required to meet all of the defined targets.
You can specify a cooldown period (in seconds) for your service-level autoscaling if you wish to limit the effects of continuous scaling activity. Once a scale up/down event has happened, a subsequent scaling event (of the same direction) will wait for the expiry of the cooldown period before being actioned. This can help to stop your service over-aggressively scaling up or down.
You can define one value which will be applied to both scaling up and scaling down:
service: web: scale: cooldown: 120
Or you can specify each value separately for the different scaling directions.
service: web: scale: cooldown: down: 180 up: 45
If you don’t define a value, the cooldown period defaults to 60 seconds.