Window Functions
Window Functions
Windows functions let you perform calculations across a set of rows that are related to the current row. Some of the calculations that you can do are similar to those that can be done with an aggregate function, but a window function doesn't cause rows to be grouped into a single output - the individual rows are still returned.
Standard Window Functions
ClickHouse supports the standard grammar for defining windows and window functions. The table below indicates whether a feature is currently supported.
Feature | Supported? |
---|---|
ad hoc window specification (count(*) over (partition by id order by time desc) ) | ✅ |
expressions involving window functions, e.g. (count(*) over ()) / 2) | ✅ |
WINDOW clause (select ... from table window w as (partition by id) ) | ✅ |
ROWS frame | ✅ |
RANGE frame | ✅ (the default) |
INTERVAL syntax for DateTime RANGE OFFSET frame | ❌ (specify the number of seconds instead (RANGE works with any numeric type).) |
GROUPS frame | ❌ |
Calculating aggregate functions over a frame (sum(value) over (order by time) ) | ✅ (All aggregate functions are supported) |
rank() , dense_rank() , row_number() | ✅ Alias: denseRank() |
percent_rank() | ✅ Efficiently computes the relative standing of a value within a partition in a dataset. This function effectively replaces the more verbose and computationally intensive manual SQL calculation expressed as ifNull((rank() OVER(PARTITION BY x ORDER BY y) - 1) / nullif(count(1) OVER(PARTITION BY x) - 1, 0), 0) Alias: percentRank() |
lag/lead(value, offset) | ❌ You can use one of the following workarounds: 1) any(value) over (.... rows between <offset> preceding and <offset> preceding) , or following for lead 2) lagInFrame/leadInFrame , which are analogous, but respect the window frame. To get behavior identical to lag/lead , use rows between unbounded preceding and unbounded following |
ntile(buckets) | ✅ Specify window like, (partition by x order by y rows between unbounded preceding and unrounded following). |
ClickHouse-specific Window Functions
There is also the following ClickHouse specific window function:
nonNegativeDerivative(metric_column, timestamp_column[, INTERVAL X UNITS])
Finds non-negative derivative for given metric_column
by timestamp_column
.
INTERVAL
can be omitted, default is INTERVAL 1 SECOND
.
The computed value is the following for each row:
0
for 1st row,- for row.
Syntax
aggregate_function (column_name)
OVER ([[PARTITION BY grouping_column] [ORDER BY sorting_column]
[ROWS or RANGE expression_to_bound_rows_withing_the_group]] | [window_name])
FROM table_name
WINDOW window_name as ([[PARTITION BY grouping_column] [ORDER BY sorting_column]])
PARTITION BY
- defines how to break a resultset into groups.ORDER BY
- defines how to order rows inside the group during calculation aggregate_function.ROWS or RANGE
- defines bounds of a frame, aggregate_function is calculated within a frame.WINDOW
- allows multiple expressions to use the same window definition.
PARTITION
┌─────────────────┐ <-- UNBOUNDED PRECEDING (BEGINNING of the PARTITION)
│ │
│ │
│=================│ <-- N PRECEDING <─┐
│ N ROWS │ │ F
│ Before CURRENT │ │ R
│~~~~~~~~~~~~~~~~~│ <-- CURRENT ROW │ A
│ M ROWS │ │ M
│ After CURRENT │ │ E
│=================│ <-- M FOLLOWING <─┘
│ │
│ │
└─────────────────┘ <--- UNBOUNDED FOLLOWING (END of the PARTITION)
Functions
These functions can be used only as a window function.
row_number()
- Number the current row within its partition starting from 1.first_value(x)
- Return the first value evaluated within its ordered frame.last_value(x)
- Return the last value evaluated within its ordered frame.nth_value(x, offset)
- Return the first non-NULL value evaluated against the nth row (offset) in its ordered frame.rank()
- Rank the current row within its partition with gaps.dense_rank()
- Rank the current row within its partition without gaps.lagInFrame(x)
- Return a value evaluated at the row that is at a specified physical offset row before the current row within the ordered frame.leadInFrame(x)
- Return a value evaluated at the row that is offset rows after the current row within the ordered frame.
Examples
Let's have a look at some examples of how window functions can be used.
Numbering rows
CREATE TABLE salaries
(
`team` String,
`player` String,
`salary` UInt32,
`position` String
)
Engine = Memory;
INSERT INTO salaries FORMAT Values
('Port Elizabeth Barbarians', 'Gary Chen', 195000, 'F'),
('New Coreystad Archdukes', 'Charles Juarez', 190000, 'F'),
('Port Elizabeth Barbarians', 'Michael Stanley', 150000, 'D'),
('New Coreystad Archdukes', 'Scott Harrison', 150000, 'D'),
('Port Elizabeth Barbarians', 'Robert George', 195000, 'M');
SELECT
player,
salary,
row_number() OVER (ORDER BY salary ASC) AS row
FROM salaries;
┌─player──────────┬─salary─┬─row─┐
│ Michael Stanley │ 150000 │ 1 │
│ Scott Harrison │ 150000 │ 2 │
│ Charles Juarez │ 190000 │ 3 │
│ Gary Chen │ 195000 │ 4 │
│ Robert George │ 195000 │ 5 │
└─────────────────┴────────┴─────┘
SELECT
player,
salary,
row_number() OVER (ORDER BY salary ASC) AS row,
rank() OVER (ORDER BY salary ASC) AS rank,
dense_rank() OVER (ORDER BY salary ASC) AS denseRank
FROM salaries;
┌─player──────────┬─salary─┬─row─┬─rank─┬─denseRank─┐
│ Michael Stanley │ 150000 │ 1 │ 1 │ 1 │
│ Scott Harrison │ 150000 │ 2 │ 1 │ 1 │
│ Charles Juarez │ 190000 │ 3 │ 3 │ 2 │
│ Gary Chen │ 195000 │ 4 │ 4 │ 3 │
│ Robert George │ 195000 │ 5 │ 4 │ 3 │
└─────────────────┴────────┴─────┴──────┴───────────┘
Aggregation functions
Compare each player's salary to the average for their team.
SELECT
player,
salary,
team,
avg(salary) OVER (PARTITION BY team) AS teamAvg,
salary - teamAvg AS diff
FROM salaries;
┌─player──────────┬─salary─┬─team──────────────────── ──┬─teamAvg─┬───diff─┐
│ Charles Juarez │ 190000 │ New Coreystad Archdukes │ 170000 │ 20000 │
│ Scott Harrison │ 150000 │ New Coreystad Archdukes │ 170000 │ -20000 │
│ Gary Chen │ 195000 │ Port Elizabeth Barbarians │ 180000 │ 15000 │
│ Michael Stanley │ 150000 │ Port Elizabeth Barbarians │ 180000 │ -30000 │
│ Robert George │ 195000 │ Port Elizabeth Barbarians │ 180000 │ 15000 │
└─────────────────┴────────┴───────────────────────────┴─────────┴────────┘
Compare each player's salary to the maximum for their team.
SELECT
player,
salary,
team,
max(salary) OVER (PARTITION BY team) AS teamMax,
salary - teamMax AS diff
FROM salaries;
┌─player──────────┬─salary─┬─team──────────────────────┬─teamMax─┬───diff─┐
│ Charles Juarez │ 190000 │ New Coreystad Archdukes │ 190000 │ 0 │
│ Scott Harrison │ 150000 │ New Coreystad Archdukes │ 190000 │ -40000 │
│ Gary Chen │ 195000 │ Port Elizabeth Barbarians │ 195000 │ 0 │
│ Michael Stanley │ 150000 │ Port Elizabeth Barbarians │ 195000 │ -45000 │
│ Robert George │ 195000 │ Port Elizabeth Barbarians │ 195000 │ 0 │
└─────────────────┴────────┴───────────────────────────┴─────────┴────────┘
Partitioning by column
CREATE TABLE wf_partition
(
`part_key` UInt64,
`value` UInt64,
`order` UInt64
)
ENGINE = Memory;
INSERT INTO wf_partition FORMAT Values
(1,1,1), (1,2,2), (1,3,3), (2,0,0), (3,0,0);
SELECT
part_key,
value,
order,
groupArray(value) OVER (PARTITION BY part_key) AS frame_values
FROM wf_partition
ORDER BY
part_key ASC,
value ASC;
┌─part_key─┬─value─┬─order─┬─frame_values─┐
│ 1 │ 1 │ 1 │ [1,2,3] │ <┐
│ 1 │ 2 │ 2 │ [1,2,3] │ │ 1-st group
│ 1 │ 3 │ 3 │ [1,2,3] │ <┘
│ 2 │ 0 │ 0 │ [0] │ <- 2-nd group
│ 3 │ 0 │ 0 │ [0] │ <- 3-d group
└──────────┴───────┴───────┴──────────────┘
Frame bounding
CREATE TABLE wf_frame
(
`part_key` UInt64,
`value` UInt64,
`order` UInt64
)
ENGINE = Memory;
INSERT INTO wf_frame FORMAT Values
(1,1,1), (1,2,2), (1,3,3), (1,4,4), (1,5,5);
-- Frame is bounded by bounds of a partition (BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
SELECT
part_key,
value,
order,
groupArray(value) OVER (
PARTITION BY part_key
ORDER BY order ASC
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS frame_values
FROM wf_frame
ORDER BY
part_key ASC,
value ASC;
┌─part_key─┬─value─┬─order─┬─frame_values─┐