Project4 Concurrency

在上面的project中，我们实现了负责管理内存页缓冲与磁盘的、用于一切数据和索引页管理的buffer pool，然后实现了一个b+树索引，最后实现了增删改查等操作的底层执行器以及部分查询优化规则。这一小节，我们将完成数据库的并发控制。需要关注事务的原子性的实现和事务的调度，关注事务并行执行时的正确性。

In this project, you will add support for transactions in BusTub by adding a lock manager and then using it for concurrent query execution.

The lock manager will support table locks and tuple locks in five lock modes: intention-shared, intention-exclusive, shared-intention-exclusive, shared, and exclusive.

The lock manager will process lock requests from transactions, grant locks to transactions, and check if locks are released appropriately based on the transaction’s isolation level.

Background

lock

聊一聊数据库中的锁 - 老刘的文章 - 知乎

一文了解db2的锁 - somenzz的文章 - 知乎

CMU 15445 14.二阶段锁定 + homework 4

【15-445】8.事务(3)串行化 - 只爱睡觉的文章 - 知乎这篇文章写得很棒

多级粒度锁机制

假设存在如下需求：事务T需要访问一个有1亿个元组的数据库。

我们希望T只发出单个请求就能封锁整个数据库，因此这里需要引入允许系统定义多级粒度(granularity)锁的机制，并允许定义各种大小的数据项和数据粒度的层次结构。这种层次结构可以图形化地表示为树。

树中每个结点都可以单独加锁。当一个事务对一个结点加锁时，该事务也以同样类型的锁隐式地封锁这个结点的全部子结点。

但这样处理存在一个问题：事务T希望封锁整个子树，因此它只需给子树的根结点加锁。但是如果T在子树的某部分持有锁，那么T给该根结点加锁就会失败。如何判定根结点是否可以加锁呢？一种可能的方法是搜索整棵树。然而，这个方法破坏了多粒度封锁机制的初衷。另一种方法是引入一种新的锁类型，即意向锁类型( intentionlock mode)。

如果一个结点加上了意向锁，则意味着将要在树的较低层进行显式加锁(也就是说，以更小的粒度加锁)。这里有三种意向锁：

**Intention-Shared (IS)**：说明在较低级别会使用共享锁（S）来显式上锁。
**Intention-Exclusive (IX)**：说明在较低级会使用互斥锁（X）来显式上锁
Shared+Intention-Exclusive (SIX)：以该节点为根的子树已经显式得到了共享锁，并且将要在树的更低层次显示加上互斥锁。

对某个结点进行加锁之前，需要获取它到根节点路径上的所有相关意向锁，具体规则如下图所示：

引入该机制后，在一个结点显式加锁之前，该结点的全部祖先均需要加上意向锁。当我们希望给某个结点加锁时，事务必须遍历从根到该节点的路径，判断路径上的节点是否被加了意向锁：如果已加，则该显示加锁过程不能成功；如果未加，则给该路径上的节点加上意向锁。

实际实现

如上节所示，锁定层次结构有多种粒度：

而在bustub中，讨论锁的并发控制，我们通常只考虑table和tuple粒度，也即表锁与行锁。

如果加表锁，则表中所有的行都受到同等程度的影响；如果加行锁，则加锁的范围针对的是表及下属的行。有时对表加锁后，还要在相应的数据行上加锁。

针对表锁和行锁，我们可以使用上一小节提到的多级粒度锁机制。具体来说，表锁包括意图锁和传统的XS锁，行锁只包括XS锁。在具体上锁操作中，首先获取意图锁表明自己对表中的某些行有读写的需求，等到真正要读写了才会真正获取行锁。也即，它支持多个事务同时对表的不同行进行读写，故而相比传统锁住整个表的XS锁会更高效一些。

如果使用意图锁，需要对表加上意图锁才能对行加上XS锁，这就相当于是上面那个树结构的简化情况。

常见的表锁及其含义如下表所示：

其互斥矩阵如下图所示：

例子

与此同时这里还有几个使用锁的例子：

concurrency

要了解事务的并发控制，首先了解什么是调度、串行化。

事务并发控制大概有以下这几种方法：

基于封锁的并发控制

使用两段封锁协议2pl来控制资源互斥的事务串行化执行，形成一个合理的串行调度；

同时，通过死锁检测方法来去除隐患的死锁。
基于时间戳的并发控制
基于有效性确认的并发控制

在本次实验中，我们就是采用基于封锁的并发控制。

两段封锁协议

在DBMS中，lock manager模块负责管理系统中的锁。每当事务需要加锁或者升级锁的时候，都需要向它发出请求。lock manager 内部维护着一个表，上面记录着当前所有锁的分配信息，用于跟踪哪些事务持有哪些锁，以及哪些事务正在等待获得哪些锁。由于我们想获得可串行化的调度序列，因此我们需要利用lock manager来实施加锁策略，以保证事务操作的正确性和并发度。

两阶段锁的实现由两阶段组成。

growing 阶段：事务可以按需获取某条数据的锁，lock manager 决定同意或者拒绝
shringking 阶段：事务只能释放之前获取的锁，不能获得新锁，即一旦开始释放锁，之后就只能释放锁。

通过2PL获得的调度策略是冲突可串行化的。

但是该方法有着致命的弱点：可能会导致级联中止 (cascading aborts)。如下图所示的 “脏读”现象。为了保证整个调度是可串行化的，DBMS 需要在 T1 中止后将一些对T1有依赖的事务也中止，这些事务曾经读取过 T1 写入的数据。而这些中止可能进而使得其它正在进行的事务级联地中止。

为了解决这个问题，可以使用2PL的增强版变种（也即bustub所采取的版本）。该方法要求事务持有的X锁一直到其提交或者中止时才释放，从而使得每个事务在结束之前，其写过的数据不能被其它事务读取或者重写。

隔离级别

三个隔离级别含义

READ_UNCOMMITTED、READ_COMMITTED、REPEATED_READ分别对应上图的1/2/3级协议。
1. READ_UNCOMMITTED
  
  由于读时不加锁，所以可以读取其他事务未提交的修改，也即有脏读现象。
2. READ_COMMITTED
  
  读时加共享锁，故而只能读取其他事务已提交的修改；读完后立刻解锁，所以当同一个事务中读两次，有可能产生不可重复读现象。
3. REPEATED_READ
  
  读时加共享锁，故而只能读取其他事务已提交的修改；事务提交时才解读锁，所以同一个事务中读的两次结果一定一致。
支持锁类型

READ_UNCOMMITTED级别不支持IS/S/SIX锁
读写操作
- A transaction should hold X locks for all write operations until it commit or aborts, regardless of its isolation level. 不论是什么隔离级别，写都需要获取X锁。
- For REPEATABLE_READ, a transaction should take and hold S locks for all read operations until it commits or aborts. 重复读，所有读操作都需要获取S锁，直到事务commit or abort才释放。
- For READ_COMMITTED, a transaction should take S locks for all read operations, but can release them immediately. 读已提交，所有读操作都需要获取S锁，读完立刻释放。
- For READ_UNCOMMITTED, a transaction does not need to take any S locks for read operations. 读未提交，所有读操作都不用获取S锁。
与2PL结合——Shrinking阶段允许解锁情况
- REPEATABLE_READ:
  
  GROWINHG阶段可以加任何锁；
  
  SHRINKING阶段不能加任何锁。【也即S、IS也必须像X一样等到事务提交/终止时才能解锁】
- READ_COMMITTED:
  
  GROWINHG阶段可以加任何锁；
  
  SHRINKING阶段只能加IS/S锁。【也即S、IS可以用完即解锁】
- READ_UNCOMMITTED:
  
  GROWINHG阶段可以加IX/X锁；【不支持共享锁】
  
  SHRINKING阶段不能加任何锁。

死锁检测

基于封锁的并发控制通过死锁检测方法来去除隐患的死锁。

在os课上，我们学过的死锁检测算法是通过构建资源分配图实现的：

然后还有环路和死锁的关系：【操作系统】死锁（详细）

① 如果进程-资源分配图中无环路

——>则此时系统没有发生死锁

② 如果进程-资源分配图中有环路，且每个资源类中仅有一个资源

——>则系统中发生了死锁，此时，环路是系统发生死锁的充要条件，环路中的进程便为死锁进程

③ 如果进程-资源分配图中有环路，且涉及的资源类中有多个资源

——>则环的存在只是产生死锁的必要条件而不是充分条件

Project

To ensure correct interleaving of transactions’ operations, the DBMS uses a lock manager (LM) to control when transactions are allowed to access data items. The basic idea of a LM is that it maintains an internal data structure about the locks currently held by active transactions. Transactions issue lock requests to the LM before they access data items, and the LM will either grant the lock, block the transaction until the lock is available, or abort the transaction.

为了保证并发控制，DBMS通常会有一个lock manager(LM)，它记录了每个锁都被哪些事务正在持有着。每个事务要访问数据前都需要获取特定的锁，LM的作用就是检查这个锁是否可用，是的话就让该事务获取锁然后执行，否则的话就阻塞该事务，或者直接中止该事务。

In your implementation, there will be a global LM for the BusTub system. The TableHeap and Executor classes will use your LM to acquire locks on tuple records (by record id RID) when a transaction attempts to access or modify a tuple.

Your LM must implement hierarchical table-level and tuple-level intention locks (described above) and three isolation levels: READ_UNCOMMITED, READ_COMMITTED, and REPEATABLE_READ. The LM should grant or release locks according to a transaction’s isolation level.

需要支持多级粒度锁和三个隔离级别。

We provide a Transaction context handle (include/concurrency/transaction.h) with an isolation level attribute (i.e., READ_UNCOMMITED, READ_COMMITTED, and REPEATABLE_READ) and information about its acquired locks.

Transaction context记录了很多有用的信息，如隔离级别、所需锁等。

The LM will need to check the isolation level of transaction and expose correct behavior on lock/unlock requests. Any invalid lock operation should lead to an ABORTED transaction state (implicit abort) and throw an exception. A failed lock attempt (such as for a deadlock) does not result in an exception, but the LM should return false for the lock request.

当锁操作不合法需要将事务abort并且抛出异常；失败的锁操作（如可能导致死锁）则不会导致异常，而只是简单地返回false。

概述

bustub采用了两段封锁协议（2PL），也即事务可以分为以下几个阶段：

* Transaction states for 2PL:
*
*     _________________________
*    |                         v
* GROWING -> SHRINKING -> COMMITTED   ABORTED
*    |__________|________________________^

并且，bustub采取了表锁和行锁粒度，遵循多级粒度锁机制。表锁有意图锁和S/X锁，行锁只有S/X锁；当访问一个表时，可以直接获取它的S/X锁，或者先获取其意图锁再获取对应访问行的S/X锁。

同时，bustub也支持三个隔离级别，也即READ_UNCOMMITTED，REPEATABLE_READ和READ_COMMITTED。它们应该分别对应于一级协议（解决丢失修改）、二级协议（解决脏读）、三级协议（解决不可重复读）。

而本次实验所需实现的，就是一个lock manager（LM），并且通过LM来实现各个executor执行的并发安全。它需要管理系统中所有资源的锁，并且当事务调用lock方法尝试获取某个类型的锁时，它需要检验是否合理后再进行授权。在bustub中它是全局的。具体来说，它对外提供这四个主要接口：

bool LockTable(Transaction *txn, LockMode lock_mode, const table_oid_t &oid);
bool UnlockTable(Transaction *txn, const table_oid_t &oid);
bool LockRow(Transaction *txn, LockMode lock_mode, const table_oid_t &oid, const RID &rid);
bool UnlockRow(Transaction *txn, const table_oid_t &oid, const RID &rid, bool force = false);

每个事务要访问数据前都需要通过这些接口获取特定的锁。LM类内部检查这个锁是否可用，是的话就让该事务获取锁（GRANT）然后执行，否则的话就阻塞（BLOCK）该事务，或者直接中止（ABORT）该事务。同时，它也会进行周期性的死锁检测，避免2pl的死锁问题。

在task1，我们需要实现LM的lock/unlock基本功能；在task2，我们需要实现LM的死锁检测功能；在task3，我们需要运用LM来保证executor执行时的并发安全。

Task #1 - Lock Manager

The only file you need to modify for this task is the LockManager class (concurrency/lock_manager.cpp and include/concurrency/lock_manager.h). You must implement the following functions:

LockTable(Transaction, LockMode, TableOID)

UnlockTable(Transction, TableOID)

LockRow(Transaction, LockMode, TableOID, RID)

UnlockRow(Transaction, TableOID, RID, force)

The LM’s specific locking mechanism depends on the transaction isolation level, whether it is a table-level or tuple-level lock, and the type of lock involved. Make sure you are familiar with the Transaction class’s API and member variables, defined in transaction.h and lock_manager.h Then, carefully read through [LOCK_NOTE], [UNLOCK_NOTE], and the LM’s functions’ specifications (in lock_manager.h).

The UnlockRow has a force parameter because executor implementations might need to determine whether a tuple is accessible before deciding whether to include it. If force is set to true, the operation bypasses all 2PL checks as if the tuple is not locked.

For UnlockRow, we have a force parameter, because in the executor implementation, we might need to peek whether a tuple is accessible by a transaction before deciding whether to scan this tuple to parent executors. If force is set to true, this operation bypasses all 2PL checks as if the tuple is never locked by the transaction.

HINTS

Think carefully about when do you need to upgrade a lock, and about what operations on the LockRequestQueue is needed when you need to update a table/tuple lock.

You will need some way to notify waiting transactions when a lock is available. We recommend using std::condition_variable provided as part of LockRequestQueue.

The lock manager should update the state of transactions as needed. For example, the state of a transaction may be changed from GROWING to SHRINKING by an unlock operation. See the methods in transaction.h

You should keep track of the locks acquired by a transaction using *_lock_set_ so that the TransactionManager can release locks appropriately when it commits or aborts a transaction.

Setting a transaction’s state to ABORTED implicitly aborts it, but it is not explicitly aborted until TransactionManager::Abort is called. You should read this function and the provided tests to understand what it does and how your lock manager is used in the abort process.

概述

要结合bustub支持的特性实现LM的lock/unlock基本功能，首先要明确lock/unlock函数的基本框架：

对于Lock()类函数，

检验加锁合法性，不合法则中止事务
一个循环，内部不断尝试获取锁，获取不到则沉睡等待唤醒

对于Unlock()类函数，

检验解锁合法性，不合法则中止事务
进行解锁相关操作，唤醒所有沉睡中事务

接下来，先介绍LM的数据结构，然后再从这三方面详细介绍LM中Lock()的实现：整体沉睡唤醒机制、合法性检测以及更新锁的实现。最后再在第四部分单独说明实现较为简单的Unlock()。

数据结构

LM的作用是管理所有资源的锁，而在bustub中资源粒度只为“table”和“tuple”。故而，只需有两个类似map的数据结构，分别记录表锁和行锁信息即可：

/** Structure that holds lock requests for a given table oid */
std::unordered_map<table_oid_t, std::shared_ptr<LockRequestQueue>> table_lock_map_;
/** Coordination */
std::mutex table_lock_map_latch_;

/** Structure that holds lock requests for a given RID */
std::unordered_map<RID, std::shared_ptr<LockRequestQueue>> row_lock_map_;
/** Coordination */
std::mutex row_lock_map_latch_;

其中LockRequestQueue记录了每个资源锁队列信息，具体结构如下：

class LockRequestQueue {
  public:
    /** 阻塞于该资源的请求队列 */
    std::list<LockRequest *> request_queue_;
    /** 用于对request_queue_内部线程进行同步，下下小节细说 */
    std::condition_variable cv_;
    /** 用于lock upgrade的实现，之后细说 */
    txn_id_t upgrading_ = INVALID_TXN_ID;
    /** 用于保护本结构 */
    std::mutex latch_;
};

class LockRequest {
  public:
    /** Txn_id of the txn requesting the lock */
    txn_id_t txn_id_;
    /** Locking mode of the requested lock */
    LockMode lock_mode_;
    /** Oid of the table for a table lock; oid of the table the row belong to for a row lock */
    table_oid_t oid_;
    /** Rid of the row for a row lock; unused for table locks */
    RID rid_;
    /** 记录该请求是否已经被授权 */
    bool granted_{false};
};

相当于每个table OR tuple都有一个等待的事务队列，和它自己的锁。

FIFO沉睡唤醒机制

感觉这里还是很有含金量的，通过std::condition_variable实现一个自定义的FIFO锁。

c++知识

std::condition_variable 是 C++ 标准库中用于线程间同步的工具之一，它通常与 std::mutex 一起使用，用于实现在特定条件下的线程等待和唤醒操作。std::condition_variable 提供了一种方式，允许一个线程等待另一个线程满足某个特定条件，然后通知等待的线程条件已经满足。

下面是一个简单的示例，演示了 std::condition_variable 的基本用法：

#include <iostream>
#include <thread>
#include <mutex>
#include <condition_variable>

std::condition_variable cv;
std::mutex mutex;

void waitForCondition() {
    std::unique_lock<std::mutex> lock(mutex);
    cv.wait(lock);

    std::cout << "Condition satisfied! Doing some work." << std::endl;
}

int main() {
    std::thread waitingThread(waitForCondition);
    std::this_thread::sleep_for(std::chrono::seconds(2));
    
    cv.notify_all();
    
    waitingThread.join();
    return 0;
}

其中的unique_lock值得注意。这东西和lock guard差不多，区别可能就在于std::unique_lock 提供了更多的灵活性，允许手动锁定和释放互斥量，并且在不同的作用域内多次锁定和解锁。一般都会将它传给条件变量的wait函数。

条件变量的基本流程图原理可见下图（来自condition_variable 条件变量）：

在这里插入图片描述

可见，它其实就是在外部上锁，然后内部解锁，等到被唤醒再次获取锁。故而，我们可以用它实现沉睡唤醒机制。

具体实现

由文档可得：

1
2

Both LockTable() and LockRow() are blocking methods; they should wait till the lock is granted and then return. If the transaction was aborted in the meantime, do not grant the lock and return false.
Unlocking a resource should also grant any new lock requests for the resource (if possible).

由此我们可知lock和unlock的大抵机制。在lock中，如果当前不可用，则沉睡等待唤醒；unlock之后则通知所有线程。

我们可以通过LockRequestQueue结构体中的条件变量cv_来实现沉睡唤醒机制。而沉睡唤醒机制实现要点之一就是明确沉睡的条件。

LM的实现要求使用FIFO顺序来满足事务请求，而std::condition_variable的notify_one方法是按随机顺序的。所以我们只能手动使用LockRequestQueue.request_queue_队列来模拟FIFO，也即仅当当前线程为队首元素，才进行锁的授权。

除此之外，还需注意一点，也即要求如果请求队列中有多个锁能被满足，那么需要对它们同时一次性授权。比如说，假定当前请求队列为“SSXSX”，那么此次授权之后的请求队列应为“SSXSX”。（划线表示已授权）

其中，这个“满足”关系可被理解为兼容。锁之间的兼容矩阵如下图所示：

故而，结合上述两点后，可得我们最终实现的沉睡条件：当且仅当该请求与它前面所有请求都兼容，才能授权该请求。以LockTable()为例，最终形成如下代码（伪码表示）：

bool LockManager::LockTable(Transaction *txn, LockMode lock_mode, table_oid_t &oid) {
  auto it = table_lock_map_.find(oid);
  std::unique_lock<std::mutex> req_queue_lock(it->second->latch_);
  it->second->request_queue_.push_back(new LockRequest(txn->GetTransactionId(), lock_mode, oid)); // 加入当前请求到队尾
  auto req_it = std::prev(it->second->request_queue_.end());

  // 当前锁需要插入
  switch (lock_mode) { // 不同的锁类型有不同的机制，在此仅以S/X锁为例
      case LockMode::EXCLUSIVE: // X锁不与任何其它锁兼容，故要求当前请求前面的为空
        // 注意这里不能直接判empty。因为有可能在wait过程中请求队列尾插了新的请求
        while (!(std::all_of(it->second->request_queue_.begin(), req_it, [is_upgraded](LockRequest* req) { return false; }))) {
          it->second->cv_.wait(req_queue_lock);
        }
          
        // 注意需要更新it，因为stl的容器增删元素后迭代器不会始终指向原来那个元素……
        for (req_it = it->second->request_queue_.begin(); req_it != it->second->request_queue_.end(); ++req_it) {
          if ((*req_it)->txn_id_ == txn->GetTransactionId()) {
            break;
          }
        }
        break;
      case LockMode::SHARED: // S锁只与S和IS兼容，故要求当前请求前面的为空或者全都是兼容类型
        while (!(std::all_of(it->second->request_queue_.begin(), req_it, [is_upgraded](LockRequest* req) { return req->lock_mode_ == LockMode::SHARED || req->lock_mode_ == LockMode::INTENTION_SHARED; }))) {
          it->second->cv_.wait(req_queue_lock);
        }

        for (req_it = it->second->request_queue_.begin(); req_it != it->second->request_queue_.end(); ++req_it) {
          if ((*req_it)->txn_id_ == txn->GetTransactionId()) {
            break;
          }
        }
        break;
    default:
      break;
  }
    
  (*req_it)->granted_ = true; // 标记为已授权
  return true;
}

而唤醒机制就相较简单了，只需在unlock时移出队列并且调用notify_all即可：

bool LockManager::UnlockTable(Transaction *txn, table_oid_t &oid) {
  auto it = table_lock_map_.find(oid);
  std::unique_lock<std::mutex> req_queue_lock(it->second->latch_);

  for (auto req_it = it->second->request_queue_.begin(); req_it != it->second->request_queue_.end(); ++req_it) {
    if ((*req_it)->txn_id_ == txn->GetTransactionId()) { // 移出队列
      BUSTUB_ASSERT((*req_it)->granted_, "Must have been granted in unlock function!");
      delete (*req_it);
      it->second->request_queue_.erase(req_it);
      break;
    }
  }

  // 通知其余线程
  it->second->cv_.notify_all();
  return true;
}

合法性检测

我将合法性检验放在了整个函数的最前面。具体来说，需要考虑以下几个因素：

隔离级别
1. 隔离级别是否支持这种类型的锁？
  
  比如说READ_UNCOMMITTED级别就不支持IS/S/SIX锁，抛出LOCK_SHARED_ON_READ_UNCOMMITTED异常；
  
  对于不同的隔离级别，在SHRINKING阶段允许加的锁的类型也不同。
  
  具体可以看background部分关于隔离级别的介绍。

其它

比如说对于行锁，它需要检测是否已经获取了对应的表锁。这里测试似乎遵循事务与线程一一对应的原则，所以我们应该只用在行锁的前面检测是否获取表锁就行，不用担心执行着执行着表锁没了的情况。我在此采用了通过txn的set判断的方法从而避免了对资源锁的争夺：

bool holding_table_lock = false;
if (lock_mode == LockMode::SHARED) {
  if ((txn->GetSharedTableLockSet()->find(oid) != txn->GetSharedTableLockSet()->end()) ||
      (txn->GetIntentionSharedTableLockSet()->find(oid) != txn->GetIntentionSharedTableLockSet()->end()) ||
      (txn->GetExclusiveTableLockSet()->find(oid) != txn->GetExclusiveTableLockSet()->end()) ||
      (txn->GetIntentionExclusiveTableLockSet()->find(oid) != txn->GetIntentionExclusiveTableLockSet()->end()) ||
      (txn->GetSharedIntentionExclusiveTableLockSet()->find(oid) !=
       txn->GetSharedIntentionExclusiveTableLockSet()->end())) {
    holding_table_lock = true;
  }
} else {
  if ((txn->GetExclusiveTableLockSet()->find(oid) != txn->GetExclusiveTableLockSet()->end()) ||
      (txn->GetIntentionExclusiveTableLockSet()->find(oid) != txn->GetIntentionExclusiveTableLockSet()->end()) ||
      (txn->GetSharedIntentionExclusiveTableLockSet()->find(oid) !=
       txn->GetSharedIntentionExclusiveTableLockSet()->end())) {
    holding_table_lock = true;
  }
}

锁的更新

LOCK UPGRADE:
Calling Lock() on a resource that is already locked should have the following behaviour:

If requested lock mode is the same as that of the lock presently held, Lock() should return true since it already has the lock.

lock_mode不变，则简单地return true

If requested lock mode is different, Lock() should upgrade the lock held by the transaction.

lock_mode有变，需要进行锁的更新

A lock request being upgraded should be prioritised over other waiting lock requests on the same resource. 这句话值得注意！！！更新锁这个请求是最高级的，需要优先处理！！

While upgrading, only the following transitions should be allowed: 只有下列情况的更新合法
1
2
3
4
IS -> [S, X, IX, SIX]
S -> [X, SIX]
IX -> [X, SIX]
SIX -> [X]
Any other upgrade is considered incompatible, and such an attempt should set the TransactionState as ABORTED and throw a TransactionAbortException (INCOMPATIBLE_UPGRADE)

Furthermore, only one transaction should be allowed to upgrade its lock on a given resource. Multiple concurrent lock upgrades on the same resource should set the TransactionState as ABORTED and throw a TransactionAbortException (UPGRADE_CONFLICT). 不能多个事务同时对某个资源更新锁

常常涉及这么一种情况：事务持有S锁，接下来又想申请将自己的S锁转化为X锁。不能简单地先释放S锁再继续申请X锁，因为这会产生一段未加锁区域从而有并发安全隐患。这时候就需要引入锁的更新。

锁的更新其实就相当于是在锁对象内部实现一个先释放再申请的原子操作了。它是这样保证更新操作的原子性的：

释放原有锁
控制其它所有目前未获取到锁的事务都不能再获取锁（也即，更新锁优先级最高，其它为授权的锁申请都只能等待直到更新锁完成）
等待直到目标更新锁可获取

这样一来，就能保证那段真空的未加锁区域不会被别的事务抢占先机了。故而，这里涉及到两个关键问题的处理，一个是如何实现锁的更新，另一个就是如何保证更新锁的优先级。

具体实现

我这里给出的解决方案是，对更新时的更新锁和非更新锁进行分别的处理。

更新锁

我的实现是在释放旧锁之后进行锁的争取。首先，释放旧锁，但是不调用notify_all；如果当前队列中存在正在执行中的并且与新锁相冲突的请求（也即只有在队列中仅存在【兼容的 || 不兼容但还没被授权】的请求时才获取锁），则等待该请求释放。

这里我本来打算实现先不释放旧锁，等到获取到新锁之后再释放旧锁。也即：

比方说S->X的情况，①先删除S的授权，然后再走常规锁的流程获取X的授权（也即有没有锁的时期），②先获取X的授权，再删除S的授权（也即有兼有S和X锁的时期）

我本来是选择②的，但现在发现完全没必要这么做，因为这是锁的内部实现，中间这段空窗期是不会有新的事务获取到该资源的（一个是不进行notify，另一个是更新锁优先级最高），故而为实现简单起见（并且线上测试也是要求先释放锁……）选择了先释放再获取。

然后，直到更新锁被唤醒后，将自己插入到请求队列头（头插表示优先级最高）。至此成功被授权。
非更新锁

当更新锁在争取锁时，控制其它所有请求都不争夺授权（也即一直保持ungranted状态，表示优先级最高）。直到冲突请求释放，更新锁被授权，才回到正常的控制流中。

其具体伪代码如下所示（仅以shared为例）：

bool is_upgraded = false;
/* case of upgrade */
for (auto req_it = it->second->request_queue_.begin(); req_it != it->second->request_queue_.end(); ++req_it) {
  if ((*req_it)->txn_id_ == txn->GetTransactionId()) {
    it->second->upgrading_ = txn->GetTransactionId();
    is_upgraded = true;
    // 先进行解锁操作，但不会进行notify
    delete (*req_it);
    it->second->request_queue_.erase(req_it);
    break;
  }
}

if (!is_upgraded) {	// 非更新锁的正常控制流。更新锁在后面才插入到队列的首部（表示优先级最高）
  it->second->request_queue_.push_back(new LockRequest(txn->GetTransactionId(), lock_mode, oid));
}
auto req_it = std::prev(it->second->request_queue_.end());
if (is_upgraded) {	// 更新锁需要关注整个队列，而不是它之前的队列
  req_it = it->second->request_queue_.end();
}

switch (lock_mode) {
  case LockMode::SHARED:
    // 这里条件比较复杂
    // 对于除了更新锁以外的其它锁，第一个条件分支满足，所以就一直等待，直到
    // it->second->upgrading_变为INVALID_TXN_ID，也即更新完成
    // 这体现了更新锁的最高级优先级
    while ((it->second->upgrading_ != INVALID_TXN_ID && !is_upgraded) ||
    // 而对于更新锁，第一个条件为false，故而看第二个条件
    // 只有在队列中仅存在兼容的（原本控制流中的lock_mode判断语句），    
    // 或者不兼容但还没被授权（(!(req->granted_)）的请求时才获取锁
            (!(std::all_of(it->second->request_queue_.begin(), req_it,
               [is_upgraded, txn, it](LockRequest *req) {
                 // 【不兼容但还没被授权 || 自己的旧锁 || 兼容的】
                 return (is_upgraded && (!(req->granted_))) || (txn->GetTransactionId() == it->second->upgrading_) || req->lock_mode_ == LockMode::SHARED || req->lock_mode_ == LockMode::INTENTION_SHARED; })))) 
    {
      it->second->cv_.wait(req_queue_lock);
      // 跟上面一样需要更新it
      if (is_upgraded) {
        req_it = it->second->request_queue_.end();
      } else {
        for (req_it = it->second->request_queue_.begin(); req_it != it->second->request_queue_.end(); ++req_it) {
          if ((*req_it)->txn_id_ == txn->GetTransactionId()) {
            break;
          }
        }
      }
    }
    break;
  default:
    break;
}

if (is_upgraded) {
  // 加锁
  auto upgraded_req = new LockRequest(txn->GetTransactionId(), lock_mode, oid);
  upgraded_req->granted_ = true;
  // 更新锁插入到队头，体现其优先级最高
  it->second->request_queue_.push_front(upgraded_req);
  // 表明更新完成
  it->second->upgrading_ = INVALID_TXN_ID;
} else {
  (*req_it)->granted_ = true;
}

更新锁的唯一性

除此之外，对于某一个资源，同一时刻只能有一个事务更新锁，不然会引发死锁问题，因为机制是持有旧锁的情况下更新新锁。

具体来说可以这样理解：（参考自https://blog.csdn.net/weixin_46879188/article/details/113882685 加一些个人实现理解）

一般更新模式由一个事务组成，此事务读取记录，获取资源的S锁，然后修改行，此操作要求锁转换为X锁。

如果两个事务获得了资源上的S锁，然后试图同时更新数据，则一个事务尝试将锁转换为X锁，此时锁队列变化情况为：SS->SX，发生锁等待。

第二个事务也试图获取X锁以进行更新，注意此刻它是不进行notify地释放锁。这时候，它会由于不满足it->second->upgrading_ != INVALID_TXN_ID这个条件（用于保障更新锁的最高优先级）而一直卡住，另一个事务也迟迟收不到notify，从而导致死锁情况发生。

那么可能就要问了，我们能不能修改一下沉睡的条件，也即将更新事务从一个拓展为一个数组，然后限制多个同时获取更新锁的情况下一个真获取其他等待呢？那么我们也很容易从结果看出，这种情况下不也跟同一时刻只能有一个事务更新锁是一个道理。。。所以不如简单粗暴地限制这个条件。

因此，在bustub实现中，当遇到更新冲突时，我们选择直接终止回滚事务。

Task #2 - Deadlock Detection

Your lock manager should run deadlock detection in a background thread, periodically building a waits-for graph and abort transactions as needed to eliminate deadlocks.

REQUIREMENTS

You must implement and use the following graph API for cycle detection:

AddEdge(txn_id_t t1, txn_id_t t2): Adds an edge in your graph from t1 to t2, representing that t1 is waiting for t2. If the edge already exists, you don’t have to do anything.

RemoveEdge(txn_id_t t1, txn_id_t t2): Removes edge t1 to t2 from your graph. If no such edge exists, you don’t have to do anything.

HasCycle(txn_id_t& txn_id): Looks for a cycle by using depth-first search (DFS). If it finds a cycle, HasCycle should store the transaction id of the youngest transaction in the cycle in txn_id and return true. Your function should return the first cycle it finds. If your graph has no cycles, HasCycle should return false.

GetEdgeList(): Returns a list of tuples representing the edges in your graph. We will use this to test correctness of your graph. A pair (t1,t2) corresponds to an edge from t1 to t2.

RunCycleDetection(): Contains skeleton code for running cycle detection in the background. You should implement your cycle detection algorithm here.

You may implement the graph however you want, as long as you support the above API. We will use that API to test your implementation.

You may need to access the status of a transaction from the member variable txn_manager_. If txn_manager_ is set to nullptr, StartDeadlockDetection will not be called, and you do not need to detect deadlocks.

HINTS

Your background cycle detection algorithm may need to get a pointer to a transaction using a txn_id. There is a member variable txn_manager_ in lock manager, and Transaction* GetTransaction(txn_id_t txn_id) enables you do that.

You can also tweak CYCLE_DETECTION_INTERVAL in common/config.h in your test cases.

概述

这里相当于是对上文background部分提到的死锁检测算法的一个简化。我们视锁为一个资源，故而每个资源类中仅有一个资源（互斥锁），所以当存在环路时，我们就可以直接判断存在死锁。而由于只有一个资源，所以资源结点其实是没什么必要的，所以我们就简化为了简单的每个顶点都为一个事务、边表示事务等待关系的图形式（waits-for graph），将问题转化为找出该图的环，如果环存在，则说明产生了死锁。

本次任务在此基础上的额外要求是：

一个事务可能有多条指向其他事务的边

Remember that the waits-for graph is a directed graph, with an edge for each transaction waiting for another transaction. Because multiple transactions may share a lock on the same object, a single transaction may be waiting for multiple transactions.

比如说SSSX，此时队列中的X对应事务就会有三条边。
图不能动态维护，必须在bg进程中实时创建/销毁

Your background thread should build the waits-for graph every time it wakes up. You should not maintain the waits-for graph over time; it should be built and destroyed every time the deadlock detection thread wakes up.

不能动态维护waits-for图，它是由死锁检测进程一次性创建的。也即，我们需要在RunCycleDetection函数中获取table_map锁和row_map锁，来进行waits-for表时的构建。

已产生死锁情况下，死锁检测进程再获取这两个锁会不会死上加死？

其实不会。仔细结合下我们的实现，在死锁的情况下，两个死锁进程都是在wait()条件变量中沉睡的，此时是释放了表锁和行锁状态，所以我们的死锁检测进程这时候获取锁是安全的，不会死上加死。
环的检测结果需要是确定性的

Your cycle detection algorithm must be deterministic. To achieve this, you should always explore the lowest transaction id first, by starting the depth-first search from the node with lowest transaction id and exploring neighbors in order (by transaction id) when searching from a node.

也即如果图中包含多个环，需要以固定的次序返回这些环。为此，我们需要按照txn_id的顺序来遍历整个图。
当检测到环时，需要对环中最年轻（youngest）的事务进行abort来破除死锁

When you find a cycle, abort the youngest transaction to break the cycle by setting that transaction’s state to ABORTED.

并且这时候，需要进行一次notify，防止这个正在处于abort状态的进程一直沉睡，导致无法真正abort。

A transaction waiting for a lock may be aborted by the background cycle detection thread. You must have a way to notify waiting transactions that they’ve been aborted.

具体实现

图的数据结构为：

1	std::map<txn_id_t, std::set<txn_id_t>> waits_for_;

由于要求通过保证DFS遍历顺序来确保结果的确定性，并且两个事务间只会有一条边，故而使用了有序的map和set。

实现图中找环的DFS算法也是比较简单的：

auto LockManager::FindCycle {
  if (visited.find(source_txn) != visited.end()) {  // has cycle
    // delete no-circle prefix
    // 注意需要删除path中[path.begin(), path.find(source_txn))
    // 这个区间内的元素，因为我们只需要环的路径
    ...
    return true;
  }

  visited.insert(source_txn);
  path.push_back(source_txn);

  for (auto &edge : waits_for_[source_txn]) {
    if (FindCycle(edge, path, visited)) {
      return true;
    }
  }

  visited.erase(visited.find(source_txn));
  path.pop_back();
  return false;
}

auto LockManager::HasCycle{
  if (waits_for_.size() < 2) {	// 剪枝
    return false;
  }
    
  for (auto &wait_pair : waits_for_) {	// 此处需要一个for循环，因为不一定是连通图
    if (FindCycle(wait_pair.first, path, visited)) { return true; }
  }
  return false;
}

然后整个检测的逻辑也是比较简单的：

void LockManager::RunCycleDetection() {
  while (enable_cycle_detection_) {
    std::this_thread::sleep_for(cycle_detection_interval);
    {
      // 1. build wait-for graph
      // 遍历table lock map和row lock map，填写wait-for graph
      // 我只对这种情况进行处理：txn1->txn2，其中txn1->granted==false,txn2->granted==true
      // 毕竟感觉这样才是比较显式的死锁……
      ...
      // 2. detect cycle and abort txn
      txn_id_t abort_txn;
      while (HasCycle(&abort_txn)) {
        // 2.1. do an abort
        txn->SetState(TransactionState::ABORTED);
          
        // 2.2. notify related txns
        for (auto &lock_req : sleep_for[abort_txn]) {
          // 上面维护了sleep_for，表明该事务正在等待哪些资源
          lock_req->cv_.notify_all();
        }
        sleep_for.erase(sleep_for.find(abort_txn));

        // 2.3. delete all related edges of wait-for graph
        ...
      }
      // 3. destory waot-for graph
      waits_for_.clear();
    }
  }
}

总的来说比较简单，没什么好说的。

Task #3 - Concurrent Query Execution

To support concurrent query execution, executors must lock and unlock tables and tuples as needed to achieve the isolation level specified in the transaction. To simplify this task, you can ignore concurrent index execution and just focus on data stored in heap files.

忽略索引，只关注涉及数据修改的地方

You must update methods of some executors implemented in Project 3.

Note that transactions should abort when lock/unlock fails. If a transaction aborts, you will need to undo its previous write operations; to achieve this, you will need to maintain the write set in each transaction, which is used by the Abort() method of the transaction manager.

需要实现UNDO撤销事务

If the executor fails to acquire a lock, you should throw an ExecutionException so that the execution engine will tell the user that the query failed.

当获取锁失败时（也即LM的方法return false或者抛出TransactionAbortException时），executor需要抛出ExecutionException。

You should not assume that a transaction only consists of just one query. Specifically, this means a tuple might be accessed by different queries more than once in a transaction. Think about how you should handle this under different isolation levels.

一个事务可能包括多个sql，也即一个tuple可能会在事务中以不同形式被访问多次。需要结合隔离级别来思考这个问题的处理方式。

概述

在上面的部分中我们实现了LM，在这个task中我们需要把LM应用到实际的数据库执行之中。

我们使用的是火山模型，故而可以比较方便地实现2PL协议。以REPEATED_READ级别下的2PL为例，我们只需在每个executor的Init中获取表锁，在Next中获取行锁，最后在事务Commit/Abort时释放，自火山模型自顶向下都依照这个规律，即可实现GROWING阶段和SHRINKING阶段的分离。

那么，我们具体来说需要修改哪些executor的实现呢？首先，为了简化起见，bustub忽略对使用索引情况的并发控制，以及对update的并发控制（只在leaderboard中要求）。其次，那些针对中间表进行处理的executor也不用【因为中间产物不可能会被并发访问……】，如nlj、hash join、聚合函数实现等等。故而，剩下的便只有对底层表直接访问的executor，也即seq scan executor和insert executor（delete executor通过seq scan executor实现，所以无需额外加解锁）。

明确了需要修改哪些地方以进行加解锁之后，还有一点需要注意，也即回滚（rollback）的实现。当事务被终止（Abort）时，我们需要对它既定修改的部分进行回滚处理，这就需要我们在涉及修改底层表格的地方维护写集（write set），也即insert和delete。

总结下来，本任务需要修改这几个文件：

To complete this task, you must add support for concurrent query execution in the following executors and the transaction manager:

src/execution/seq_scan_executor.cpp

src/execution/insert_executor.cpp

src/execution/delete_executor.cpp

src/concurrency/transaction_manager.cpp

You must pass all tests and produce correct results for the Terrier Benchmark (see below) without segfaults and deadlocks. You do not need to handle concurrency for indexes or the update executor, except for the leaderboard tests.

接下来，将从两个方面进行详细的介绍。

具体实现

总体

SeqScan Executor

In Init, take a table lock. Get an iterator by using MakeEagerIterator instead of MakeIterator. (MakeIterator is introduced to avoid the Halloween problem【下文详细说明】 in Project 3’s UpdateExecutor, but you do not need it now.)

In Next

Get the current position of the table iterator.

Lock the tuple as needed for the isolation level.

Fetch the tuple. Check tuple meta, and if you have implemented filter pushdown to scan, check the predicate.

If the tuple should not be read by this transaction, force unlock the row. Otherwise, unlock the row as needed for the isolation level.

If the current operation is delete (by checking executor context IsDelete(), which will be set to true for DELETE and UPDATE), you should assume all tuples scanned will be deleted, and you should take X locks on the table and tuple as necessary in step 2.

Insert Executor

In Init, take a table lock.

In Next, pass the lock manager, transaction object, and table id to InsertTuple so as to insert and lock a tuple atomically.

它这边步骤说得很详细了，再次便不多说。

需要注意的有以下几点：

表锁应该使用意图锁（IX/IS），行锁应该使用XS锁
insert只需用IX表锁，不用担心scan会遍历到新插入的元组是否有问题，因为这是依隔离级别而定的
注意一下不同隔离级别的要求，具体可以见background部分对隔离级别的介绍
锁更新的处理

需要考虑多次插入删除锁的更新情况，比如下面的例子，一个事务就是一行：
1
INSERT INTO result SELECT * FROM tmp; DELETE FROM tmp; SELECT count(*) FROM result;
并且最后select是要能感知到前面的结果的。

这种情况下，为了避免死锁or abort问题，得先在获取S（IS）锁之前判断是否含有X（IX），是的话不获取，否则才获取。

回滚

Insert Executor

Be sure that you maintain the write set as needed.

Delete Executor

If you have implemented SeqScanExecutor correctly based on IsDelete() in executor context, you do not need to take any any locks in this executor.

Be sure that you maintain the write set in Next.

Transaction Manager

In Commit, you generally do not need to do anything except release all the locks.

In Abort, you should revert all changes of this transaction based on its write set.

要点：

在insert和delete中增加对事务writeset修改的语句。这个只是调用api，没什么好说的

rollback实现

在abort中需要倒序遍历writeset进行UNDO：

for (auto it = txn->GetWriteSet()->rbegin(); it != txn->GetWriteSet()->rend(); ++it) {
    auto meta = it->table_heap_->GetTupleMeta(it->rid_);
    meta.is_deleted_ = !(meta.is_deleted_);	// 对delete和insert都适用的写得方便的小trick
    it->table_heap_->UpdateTupleMeta(meta, it->rid_);
}

Terrier Benchmark

大概就是，表结构：

1 2	CREATE TABLE nft(id INT, terrier INT); INSERT INTO nft VALUES (0, 0), (1, 1), ...;

有两个线程，线程里不断循环运行着这样的两个事务：

-- begin txn1
SELECT * FROM nft WHERE id = <nft_id>; -- record the terrier_id
DELETE FROM nft WHERE id = <nft_id>;
-- end txn1

-- begin txn2
INSERT INTO nft VALUES (<nft_id>, <terrier_id>)
-- end txn2

除此之外还有另一个线程，线程里有这么个事务：

-- begin txn3
SELECT count(*) FROM nft WHERE terrier = <terrier_id>;
SELECT count(*) FROM nft WHERE terrier = <terrier_id>;
-- end txn3

设定在REPEATED_READ级别下，所以两个查询结果应该一致。这样地检查，最后统计的指标是这三种事务的吞吐量以及终止事务的个数。

遇到的问题

首先就是特别傻的一个问题，terrier bench中不知怎的insert也成功了，但是select就是检测不到insert的结果……大概执行序列是这样的：

insert txn begin.
scan txn begin.
insert txn commit.
scan txn commit.

看了半天没懂为什么insert txn已经提交scan txn还是看不到它的结果。。。最后发现是这里忘改了呃呃：

附：Halloween Protection

SeqScan Executor

In Init, take a table lock. Get an iterator by using MakeEagerIterator instead of MakeIterator. (MakeIterator is introduced to avoid the Halloween problem in Project 3’s UpdateExecutor, but you do not need it now.)

什么是Halloween Problem

sql server的处理方法

SQL 中的 Halloween Problem

大概是说，update操作的实现方法为先delete再insert，并且一般优化它下面可能会连接一个index scan，就会导致在配合底层迭代器使用时，一个元组被更新多次，从而产生死循环、更新错误等问题。Sql Server为了防止该问题，采取了spool（一个用于缓存或存储中间结果的临时存储区域）来暂存原本的索引，遍历过程中也会转而遍历spool、写入真正的物理存储，从而防止索引乱套带来的问题。

而在Project#4中提到的“MakeIterator is introduced to avoid the Halloween problem in Project 3’s UpdateExecutor”，指的是我们在本次实验中会通过原地更新而非先删除再插入来实现update，所以就不会发生Halloween Problem。

接下来，我们可以了解一下Project#3中具体的防范实现。首先可以了解一下bustub会产生什么样的Halloween Problem。

由于bustub的删除不是真删除，而是采用标记+尾插法的方式，故而应该不会出现反复访问某个元组多次的现象（因为不会发生物理现象的位移）。真正需要防范的，是多个进程同时进行迭代和更新的现象。

比如说，进程A进行insert操作，进程B进行update操作。我们的原意是先update再insert，故而就算insert插入的新元组符合条件，我们也不应该将其进行update。但是，在不进行并发保护的情况下（也即project#3的情况），如果按照标准迭代器实现（也即遍历到表格末尾才停止），我们很有可能会连insert进来的新元组也遍历到，从而一起更新。

也即，update了多余的元组，这会是bustub中真正发生的Halloween Problem。防范方式相信也很直观了，就是限制update中遍历的表范围为update时的表范围，不囊括之后插入的元组。

我们是通过MakeIterator来实现防范的。看下TableIterator的构造函数：

1	TableIterator(TableHeap *table_heap, RID rid, RID stop_at_rid);

可知它这实际上是一个区间结构，迭代范围为[rid, stop_at_rid)。当遇到stop_at_rid，迭代器就会使IsEnd()为真。

对比MakeIterator和MakeEagerIterator的实现：

// MakeIterator
return {this, {first_page_id_, 0}, {last_page_id, page->GetNumTuples()}};
// MakeEagerIterator
return {this, {first_page_id_, 0}, {INVALID_PAGE_ID, 0}};

前者是以last_page_id为界，而后者以INVALID_PAGE_ID为界。其中，last_page_id表示当前表的最后一页物理页，会在insert之中被修改。这样一来，MakeIterator就能限制遍历的范围为当前创建的所有元组了。

而在Project#4中，我们会实现update的原地更新，所以无需防范Halloween Problem的发生，故而无需使用MakeIterator，只使用MakeEagerIterator即可。

Leaderboard

只能说前面性能烂了一路，到这里总的只会更烂hhh

运行命令：

1 2	make -j`nproc` terrier-bench ./bin/bustub-terrier-bench --duration 30000 --nft 10000

它这里给了几个优化策略：

谓词下推到seq scan

Predicate pushdown to SeqScan: You can implement a predicate filter in SeqScanExecutor so that you lock fewer tuples when doing SeqScan. You can enable MergeFilterScan optimizer rule merge_filter_scan.cpp and implement this optimization.

这个就是要enableMergeFilterScan，在p3已经开启了。
实现in-place的update操作

Implement In-Place UpdateExecutor: You can improve your UpdateExecutor implementation so that tuples can be updated in-place and will probably be more efficient. Modify terrier_benchmark_config.h to instruct Terriers to use UPDATE for exchanging NFTs.

之前在terrier bench中都是通过先delete再insert实现，这里之后就直接调用update executor了。

实现方式也很简单，用函数UpdateTupleInPlaceUnsafe即可。

并且注意需要兼容Abort部分，需要维护写集，在Abort中恢复。但它给定的TableWriteRecord并没有记录old tuple的字段，所以我只得使用IndexWriteRecord来凑数了。

需要注意的一点是，我此处的in-place update的实现并没有更新索引，也即默认它更新字段是不包括索引字段的。这仅能通过terrier bench，会寄在p3的index scan test。
使用索引

Use Index: You can create an index over the NFT table, and then push the predicate down to IndexScanExecutor to do index lookup. For example, if we have an index over NFT’s id column, the SELECT * FROM nft WHERE id = 1 can actually be done like (1) extract id = 1 predicate and (2) directly call GetValue(1) over the index, instead of doing a full index scan or table scan. You will need to update index scan plan to facilitate this optimization. Modify terrier_benchmark_config.h to instruct Terriers to create an index before exchanging NFTs.

大概就是需要把update-seqscan化为update-indexscan。这个也在p3做过了。然后就是需要在index scan executor的实现中仿照seq scan加解锁。