Sumeet Bajaj PhD Defense: Sumeet Bajaj,”Regulatory Compliance in Data Management”


Achieving Regulatory Compliance in Data Management
Sumeet Vijay Bajaj


Regulations mandate consistent procedures for information access, processing, and storage.
In the United States alone, over 10,000 data management regulations exist in the
financial, life sciences, health care and government sectors. A recurrent theme in data
management regulations is the need for regulatory compliant storage to ensure data confidentiality,
data integrity, audit trails maintenance, data retention, and guaranteed deletion.
This thesis describes the design and implementation of several regulatory compliant relational
databases and file systems. The systems increase efficiency and lower costs of regulatory
compliance through the use of novel cryptographic and system security constructs.
The first system described in this thesis is TrustedDB. TrustedDB is a relational database
that ensures data confidentiality. TrustedDB enables SQL query execution over an encrypted
database hosted with a remote, untrusted service provider. TrustedDB is the first DBMS
with data confidentiality that does not limit query expressiveness. Moreover, the per query
execution costs in TrustedDB are orders of magnitude lower than current cryptographybased
mechanisms. To significantly lower query execution costs, TrustedDB leverages serverhosted,
tamper-proof trusted hardware in critical query processing stages.
The second system described in this thesis is CorrectDB. CorrectDB is a relational
database that provides efficient, low-cost Query Authentication (QA). QA requires strict
guarantees for both the correctness and completeness of the query results returned by potentially
compromised providers. Similar to TrustedDB, CorrectDB leverages server-hosted
trusted hardware. CorrectDB achieves economy and efficiency by minimizing server-side
authentication data and by reducing the client-server communication overheads.
The third system described in this thesis is the history independent file system (HIFS).
HIFS guarantees secure data deletion by providing full history independence across both file
system and disk layers of the storage stack. HIFS overcomes the challenge of simultaneously
preserving history independence and data locality. Moreover, HIFS is customizable to suit
several data locality scenarios, such as block-group locality and sequential file storage.
This thesis also builds the theoretical foundations of history independence. The thesis
explores the concepts of abstract data types, data structures, machine models, memory
representations and history independence itself. The thesis then proposes Δ history independence
(ΔHI), a generic game-based framework that is malleable enough to define a broad
spectrum of new history independence notions. To bridge the gap between theory and practice,
the thesis outlines a general process for building history independent systems. HIFS
itself is designed using the suggested process.
Finally, this thesis describes Ficklebase. Ficklebase is a relational database that provides
irrecoverable data erasure. In Ficklebase, once a tuple is deleted all side effects of the delete
tuple are removed. Removal of all side effects of a deleted tuple achieves the same effect as
if the deleted tuple was never inserted in the database. Ficklebase thus eliminates all traces
of deleted data rendering data irrecoverable and also guaranteeing that the deletion itself is