- ISSN: 2155-7950
- Journal of Business and Economics
B-tree Construction with Huge Volume of Data on Hadoop
Huynh
Cong Viet Ngu
()
Abstract: In Socialist Republic of Vietnam, applying the big data to process any kind of data is still a challenge. For example, until now only some corporates have applied big data to develop data warehouse systems which have consistent and invaluable supports to make immediate decisions as well as planning long-term strategies. Nowadays, large amounts of traditional data are still increasing significantly, B-tree is considered as the potential data structure that can manages and organizes this kind of data. However it usually takes a lot of time to construct a B-tree for a huge volume of data, in order to solve the problems that related to collection of large datasets that cannot be processed using traditional computing techniques, big data is considered as optimal solution for scalable processing of this kind of dataset. In this paper, we propose a parallel B-Tree construction scheme based on a Hadoop framework. The proposed scheme divides the data into partitions, builds local B-trees in parallel, and merges them to construct a B-tree that covers the whole data set. While generating the partitions, it considers the data distribution so that each partition has nearly equal amount of data. Therefore the proposed scheme gives an efficient index structure while reducing the construction time.
Key words: B-tree; hadoop, map-reduce; big data; Viet Nam
JEL codes: C55, C8