HOME
Distributed Systems      Data Mining      Misc

Distributed Systems

Latest Systems

Ceph: A scalable, High-Performance Distributed File System (link) (pdf)

Data Placment & Replication

Worldwide Fast File Replication on Grid Datafarm (pdf)
Data Replication in Hadoop (link)
CRUSH: Controlled, Scalable, Decentralized Placment of Replicated Data (pdf)

Data Striping

User-Level

FUSE: File System in Userspace (link)
Rapid File System Development Using ptrace (pdf)

Performance Measurement

File System Benchmarks, Then, Now, and Tomorrow (pdf)
Parallel I/O Examples and Benchmakr Codes (link)
IOzone Filesystem Benchmark (link)
Benefits of High Speed Interconnects to Cluster File Systems: A Case Study with Lustre (pdf)

Google Section


Data Mining

Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching (pdf)

Misc

Reinventing the Bazzar (link)
Beautiful Code (link)




Last Updated: Decemeber 1st, 2007