Reference

Opening reference

Preparing the article, table of contents, and metadata. 본문, 목차, 메타데이터를 준비하고 있습니다.

Loading category cards

Preparing references and filters for this topic. 이 주제의 레퍼런스와 필터를 준비하고 있습니다.

PostgreSQL집계와 분석

aggregate FILTER와 ORDER BY

PostgreSQL aggregate 함수에서 FILTER, DISTINCT, aggregate 내부 ORDER BY를 사용해 조건부 집계와 순서 의존 집계를 안전하게 작성하는 기준입니다.

마지막 수정 2026년 5월 23일

숏컷 코드

sql

SELECT
  count(*) AS total_orders,
  count(*) FILTER (WHERE status = 'paid') AS paid_orders,
  sum(amount) FILTER (WHERE status = 'paid') AS paid_amount,
  array_agg(id ORDER BY created_at DESC) AS recent_order_ids
FROM orders;

sql

SELECT
  customer_id,
  string_agg(DISTINCT status, ', ' ORDER BY status) AS statuses
FROM orders
GROUP BY customer_id;

문법

FILTER는 집계 함수마다 입력 행을 따로 거른다

FILTER (WHERE ...)는 WHERE 절처럼 전체 행을 제거하지 않고, 해당 aggregate 함수에 들어갈 행만 고릅니다. 같은 SELECT 안에서 전체 건수, 결제 건수, 취소 건수를 함께 계산할 때 CASE WHEN을 여러 번 중첩하는 것보다 의도가 분명합니다.

sql

SELECT
  date_trunc('day', created_at) AS day,
  count(*) AS total,
  count(*) FILTER (WHERE status = 'paid') AS paid,
  count(*) FILTER (WHERE status = 'cancelled') AS cancelled
FROM orders
GROUP BY day
ORDER BY day;

FILTER 조건은 aggregate 입력 단계에서 평가됩니다. 그래서 WHERE status = 'paid'를 쓰면 미결제와 취소 행이 전체 집계에서도 빠지고, count(*) FILTER (WHERE status = 'paid')를 쓰면 전체 행은 유지한 채 결제 집계만 분리됩니다.

aggregate 내부 ORDER BY는 입력 순서가 결과에 영향을 줄 때 쓴다

sum, min, max처럼 입력 순서와 무관한 aggregate에는 내부 ORDER BY가 필요 없습니다. 반대로 array_agg, string_agg, jsonb_object_agg처럼 입력 순서가 결과 모양에 영향을 주는 함수는 aggregate 내부에서 순서를 고정해야 합니다.

sql

SELECT
  user_id,
  array_agg(event_name ORDER BY occurred_at, id) AS event_path
FROM events
GROUP BY user_id;

쿼리 맨 끝의 ORDER BY는 결과 행의 정렬이고, aggregate 내부 ORDER BY는 aggregate 함수에 들어가는 값의 정렬입니다. array_agg(event_name)만 쓰면 실행 계획이나 입력 순서 변화에 따라 배열 순서가 바뀔 수 있습니다.

DISTINCT와 ORDER BY를 함께 쓰면 정렬 표현식이 제한된다

array_agg(DISTINCT value ORDER BY value)처럼 중복 제거와 정렬을 함께 쓸 수 있습니다. 다만 DISTINCT가 붙은 aggregate에서 내부 ORDER BY는 DISTINCT 대상 표현식 안에 있는 값만 참조할 수 있습니다. 중복 제거 기준과 정렬 기준이 다르면 먼저 별도 CTE나 서브쿼리에서 대표 행을 고르는 편이 명확합니다.

sql

WITH latest_status AS (
  SELECT DISTINCT ON (customer_id, status)
    customer_id,
    status,
    created_at
  FROM orders
  ORDER BY customer_id, status, created_at DESC
)
SELECT
  customer_id,
  array_agg(status ORDER BY created_at DESC) AS statuses
FROM latest_status
GROUP BY customer_id;

선택 기준

상황	적합한 선택
전체 집계와 조건부 집계를 함께 보여 줌	`FILTER (WHERE ...)`
조건별 합계나 건수를 여러 개 계산	여러 aggregate에 각각 `FILTER` 적용
순서가 중요한 배열, 문자열, JSON 집계	aggregate 내부 `ORDER BY`
중복을 제거한 값 목록	`DISTINCT` aggregate
중복 제거 기준과 정렬 기준이 다름	CTE에서 대표 행 선택 후 aggregate

CASE WHEN은 계산식 자체를 바꿔야 할 때 여전히 유용합니다. 단순히 "어떤 행만 집계할 것인가"가 핵심이면 FILTER가 더 직접적이고, "조건에 따라 집계할 값이 달라지는가"가 핵심이면 CASE가 더 읽기 쉽습니다.

주의할 점

aggregate 내부 ORDER BY와 쿼리 마지막 ORDER BY를 혼동하면 결과 행 순서만 고정되고 배열이나 문자열 내부 순서는 고정되지 않습니다. 순서가 결과 자체의 일부라면 반드시 aggregate 함수 안에 정렬 기준을 넣어야 합니다.

sql

-- 배열 내부 순서는 보장하지 않음
SELECT user_id, array_agg(event_name)
FROM events
GROUP BY user_id
ORDER BY user_id;

-- 배열 내부 순서를 고정
SELECT user_id, array_agg(event_name ORDER BY occurred_at, id)
FROM events
GROUP BY user_id
ORDER BY user_id;

count(column)은 NULL을 세지 않고 count(*)는 행 자체를 셉니다. 조건부 집계에서 NULL 여부까지 의미가 있으면 count(*) FILTER (...)와 count(column) FILTER (...)의 차이를 명확히 구분해야 합니다.

참고 링크

2 sources

Reference

Opening reference

Preparing the article, table of contents, and metadata. 본문, 목차, 메타데이터를 준비하고 있습니다.

PostgreSQL집계와 분석

aggregate FILTER와 ORDER BY

PostgreSQL aggregate 함수에서 FILTER, DISTINCT, aggregate 내부 ORDER BY를 사용해 조건부 집계와 순서 의존 집계를 안전하게 작성하는 기준입니다.

마지막 수정 2026년 5월 23일

숏컷 코드

sql

SELECT
  count(*) AS total_orders,
  count(*) FILTER (WHERE status = 'paid') AS paid_orders,
  sum(amount) FILTER (WHERE status = 'paid') AS paid_amount,
  array_agg(id ORDER BY created_at DESC) AS recent_order_ids
FROM orders;

sql

SELECT
  customer_id,
  string_agg(DISTINCT status, ', ' ORDER BY status) AS statuses
FROM orders
GROUP BY customer_id;

문법

FILTER는 집계 함수마다 입력 행을 따로 거른다

sql

SELECT
  date_trunc('day', created_at) AS day,
  count(*) AS total,
  count(*) FILTER (WHERE status = 'paid') AS paid,
  count(*) FILTER (WHERE status = 'cancelled') AS cancelled
FROM orders
GROUP BY day
ORDER BY day;

aggregate 내부 ORDER BY는 입력 순서가 결과에 영향을 줄 때 쓴다

sql

SELECT
  user_id,
  array_agg(event_name ORDER BY occurred_at, id) AS event_path
FROM events
GROUP BY user_id;

DISTINCT와 ORDER BY를 함께 쓰면 정렬 표현식이 제한된다

sql

WITH latest_status AS (
  SELECT DISTINCT ON (customer_id, status)
    customer_id,
    status,
    created_at
  FROM orders
  ORDER BY customer_id, status, created_at DESC
)
SELECT
  customer_id,
  array_agg(status ORDER BY created_at DESC) AS statuses
FROM latest_status
GROUP BY customer_id;

선택 기준

상황	적합한 선택
전체 집계와 조건부 집계를 함께 보여 줌	`FILTER (WHERE ...)`
조건별 합계나 건수를 여러 개 계산	여러 aggregate에 각각 `FILTER` 적용
순서가 중요한 배열, 문자열, JSON 집계	aggregate 내부 `ORDER BY`
중복을 제거한 값 목록	`DISTINCT` aggregate
중복 제거 기준과 정렬 기준이 다름	CTE에서 대표 행 선택 후 aggregate

주의할 점

sql

-- 배열 내부 순서는 보장하지 않음
SELECT user_id, array_agg(event_name)
FROM events
GROUP BY user_id
ORDER BY user_id;

-- 배열 내부 순서를 고정
SELECT user_id, array_agg(event_name ORDER BY occurred_at, id)
FROM events
GROUP BY user_id
ORDER BY user_id;

참고 링크

2 sources