H. Egemen Ciritoglu
The massive growth in the volume of data and the demand for big data utilisation has led to an increasing prevalence of Hadoop Distributed File System (HDFS) solutions. However, the performance of Hadoop and indeed HDFS has some limitations and remains an open problem in the research community. Replication is a well-known technique for improving the performance of HDFS as increasing the replication factor is directly linked to increasing data availability. Varying the replication factor can allow significant gains in performance and also the placement of replicas is a crucial problem in clusters. The ultimate goal of my research is to develop an adaptive replication management system.