Understanding Prometheus Range Vectors
Range Vectors in Prometheus are a bit non-intuitive, unless you’ve thoroughly read and understood the documentation. Who does that? One is supposed to spend time doing it incorrectly, then look for random blog posts like this one to understand how open source software works, right?
Kidding aside, we should probably start by clarifying some definitions.
What’s a Vector?
Since Prometheus is a timeseries database, all data is in the context of some timestamp. The series that maps a timestamp to recorded data is called a timeseries. In Prometheus lingo, a set of related timeseries is called a vector. Let’s discuss an example to illustrate this better.
Assume that http_requests_total
is a vector representing the total number of http requests received by a service.
Vectors allow us to specify further dimensions called “labels” so that we can mark data as such.
Some examples are:
// the set of timeseries representing the number of requests with a `200` HTTP response code.
http_requests_total{code="200"}
// the set of timeseries representing the number of requests served by the `/api/v1/query` handler.
http_requests_total{handler="/api/v1/query"}
Thus we have all the granular information related to number of HTTP requests served, while still having the option of aggregating it if needed.
Syntactically, http_requests_total
refers to the entire set of timeseries that are named that.
And by appending a {code="200"}
or {handler="/api/v1/query"}
, we’re selecting a subset.
Types of Vectors
Prometheus further defines two types of vectors, depending on the what the timestamps map to:
- Instant vector - a set of timeseries where every timestamp maps to a single data point at that “instant”.
We can see the single value recorded at the timestamp
1608481001
in the below response.curl 'http://localhost:9090/api/v1/query' \ --data 'query=http_requests_total{code="200"}' \ --data time=1608481001
{ "metric": {"__name__": "http_requests_total", "code": "200"}, "value": [1608481001, "881"] }
- Range vector - a set of timeseries where every timestamp maps to a “range” of data points, recorded some duration into the past.
These cannot exist without a specified duration called the “range”, which is used to build the list of values for every timestamp.
In the below example, note the list of values accompanied by a timestamp, up to
30s
into the past from1608481001
.curl 'http://localhost:9090/api/v1/query' \ --data 'query=http_requests_total{code="200"}[30s]' \ --data time=1608481001
{ "metric": {"__name__": "http_requests_total", "code": "200"}, "values": [ [1608480978, "863"], [1608480986, "874"], [1608480094, "881"] ] }
With all the definitions in place, we establish two ideas regarding these vector types:
- Instant vectors can be charted; Range vectors cannot. This is because charting something involves displaying a data point on the y-axis for every timestamp on the x-axis. Instant vectors have a single value for every timestamp, while range vectors have many of them. For the purpose of charting a metric, it is undefined1 how to show multiple data points for a single timestamp in a timeseries.
- Instant vectors can be compared and have arithmetic performed on them; Range vectors cannot. This is also due to the way comparison and arithmetic operators are defined. For every timestamp, if we have multiple values, we don’t know how to add1 or compare them to another timeseries of a similar nature.
Why do we even need Range Vectors?
We now understand that Range Vectors can’t be used for charting or aggregation. It is therefore, only natural to ask why do they even exist? The answer is simple: counters. The counter is one of the fundamental types in a monitoring system, apart from gauges and timings. We will try to understand how counters and range vectors interact by continuing on our earlier example.
Let’s say we want to find out how many requests our service is handling right now.
Our metric http_requests_total{code="200",handler="/api/v1/query"}
is an instant vector with values that represent a monotonically increasing counter2.
This counter measures the number of requests our service has received in total.
We know that Prometheus has “scraped” this counter at various times in the past, so we could simply start by requesting the counter’s value:
curl 'http://localhost:9090/api/v1/query' \
--data 'query=http_requests_total{code="200",handler="/api/v1/query"}'
{
"metric": {"__name__": "http_requests_total", "code": "200", "handler":"/api/v1/query"},
"value": [1608437313, "881"]
}
But as we see in the response, doing that gives us the total number of requests received which we’re not interested in. We care about the number of requests it received some finite duration into the past, for eg. the last fifteen minutes. How do we get this number, when all we have is an ever increasing counter?
A better way is to take the current value of the counter and subtract the value of the counter as seen fifteen minutes ago.
That would give us the exact number number of requests that the instance received in that time duration.
To represent this in PromQL, we take the instant vector and append our duration [15m]
.
This part is called the range selector and it transforms the instant vector into a range vector.
We then use a function like increase
which effectively subtracts the data point at the start of the range from the one at the end3.
curl 'http://localhost:9090/api/v1/query' \
--data 'query=increase(http_requests_total{code="200",handler="/api/v1/query"}[15m])'
{
"metric": {"__name__": "http_requests_total", "code": "200", "handler":"/api/v1/query"},
"values": [
[1608437313, "18.4"]
]
}
To describe the query in words: “it is the increase in the total number of requests over the past fifteen minutes”. The response also contains a single number representing the answer, which is what we were expecting. The result is in the form of an instant vector which may now be further charted or aggregated.
Range Vector Functions
Similar to increase(range-vector)
, the below PromQL functions also operate only on range vectors:
changes(range-vector)
absent_over_time(range-vector)
delta(range-vector)
deriv(range-vector)
holt_winters(range-vector, scalar, scalar)
idelta(range-vector)
irate(range-vector)
predict_linear(range-vector, scalar)
rate(range-vector)
resets(range-vector)
avg_over_time(range-vector)
min_over_time(range-vector)
max_over_time(range-vector)
sum_over_time(range-vector)
count_over_time(range-vector)
quantile_over_time(scalar, range-vector)
stddev_over_time(range-vector)
stdvar_over_time(range-vector)
All of these return instant vectors as a result of their computation. Thus we can conclude that range vectors are useful as input to these functions that operate on a “range” of values.
There’s more to range vectors than just the above functions and curls, but we will cover that in another blog post.
Footnotes
-
Undefined behaviour does not imply the impossibility of defining a way of making these operations work. All it means is that the implementation chooses to avoid supporting this. This could be done to simplify implementation or because there may not be a way to make it work consistently across various use cases. ↩ ↩2
-
A monotonically increasing counter’s value never decreases; it either increases or stays the same. Prometheus allows exactly one case where a counter value may decrease, and that is during a target restart. If a counter value drop below a previous recorded value, range vector functions like
rate
andincrease
will assume that the target restarted and add the entire value to the existing one it knows. This is also why we should always rate then sum and not sum then rate. ↩ -
“effectively” being the key word here.
increase
actually also does extrapolation, as the requested duration may not have data points exactly aligned at the “start” and “end” of the range. ↩