PostgreSQL的CTEs特性
SQL语句通常不是很容易理解,特别是你阅读别人已经写好的语句。因此,很多人指出我们 应该遵循在其他语言中遵循的原则,像加上注释和功能模块化。我最新注意到一个很多人 都没有使用的Postgres关键特性,也就是@timonk在AWS Re:Invent 大会关于数据仓库服务 Redshift主题演讲时指出的一个特性。这个特性实际上使得SQL兼具了可读性和模块性。 在以前,我回头阅读自己的几个月前的SQL语句,通常很难理解,而现在我可以做到这一点 。
这个特性就是CTEs,也就是公用表表达式,你有可能称做它为WITH 语句。和数据库中视图 一样,它的主要好处就是,它允许你在当前事务中创建临时表。你可以大量使用它,因为 它允许你思路清晰的构建模块,别人很容易就理解你在做什么。
让我们举个简单的例子:
WITH users_tasks AS ( SELECT users.email, array_agg(tasks.name) as task_list, projects.title FROM users, tasks, project WHERE users.id = tasks.user_id projects.title = tasks.project_id GROUP BY users.email, projects.title )
通过这样定义临时表users_tasks
,我就可以在后面加上对临时表的基本查询语句,
像:
SELECT * FROM users_tasks;
有趣的是你可以将它们连在一起。当我知道分配给每个用户的任务量时,也许我想知道在 一个指定的任务上,谁因为对这个任务负责超过了50%而因此造成瓶颈。为了简化,我们 可以使用多种方式,先计算每个任务的总量,然后是每人针对每个任务的负责总量。
total_tasks_per_project AS ( SELECT project_id, count(*) as task_count FROM tasks GROUP BY project_id ), tasks_per_project_per_user AS ( SELECT user_id, project_id, count(*) as task_count FROM tasks GROUP BY user_id, project_id ),
现在我们将组合一下然后发现超过50%的用户:
overloaded_users AS ( SELECT tasks_per_project_per_user.user_id, FROM tasks_per_project_per_user, total_tasks_per_project WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2) )
最终目标,我想获得超负荷工作这的用户和任务的逗号分隔列表。我们只要简单地对
overloaded_users
和users_tasks
的初始列表进行join操作。
放在一起可能有点长,但是可读性强。作为额外帮助,我又在每一层加了注释。
--- Created by Craig Kerstiens 11/18/2013 --- Query highlights users that have over 50% of tasks on a given project --- Gives comma separated list of their tasks and the project --- Initial query to grab project title and tasks per user WITH users_tasks AS ( SELECT users.id as user_id, users.email, array_agg(tasks.name) as task_list, projects.title FROM users, tasks, project WHERE users.id = tasks.user_id projects.title = tasks.project_id GROUP BY users.email, projects.title ), --- Calculates the total tasks per each project total_tasks_per_project AS ( SELECT project_id, count(*) as task_count FROM tasks GROUP BY project_id ), --- Calculates the projects per each user tasks_per_project_per_user AS ( SELECT user_id, project_id, count(*) as task_count FROM tasks GROUP BY user_id, project_id ), --- Gets user ids that have over 50% of tasks assigned overloaded_users AS ( SELECT tasks_per_project_per_user.user_id, FROM tasks_per_project_per_user, total_tasks_per_project WHERE tasks_per_project_per_user.task_count > (total_tasks_per_project / 2) ) SELECT email, task_list, title FROM users_tasks, overloaded_users WHERE users_tasks.user_id = overloaded_users.user_id
CTEs通常不如经过精简优化过的SQL语句性能高。大多数差距小于一倍差距。对我而言, 这种为了可读性作出的折中是毋庸置疑的。Postgres优化器以后肯定会针对这点变的更好 。
多说一句,是的我可以用大约10-15行简短的SQL语句做同样的事情,但是你也许不能很快 地理解它。当你碰到需要保证SQL做正确的事情时,可读性的优势就出来了。SQL语句总是 有个结果,你对此毫无疑问。确保你SQL语句容易推理是保证正确性的关键。