Saturday, October 3, 2015

[9.4] Sort the 2GB file

1. Example

0. 2GB file -  don't want you to bring all the data into memory
1. One string per line
2. Sorting algorithm

2. Implementation

When an interviewer gives a size of limit of 2GB, it should tell you something
- in this case, it suggests that they don't want you to bring all the data into memory.

So what do you do? We only bring part of the data into memory..

Algorithm:

How much memory so we have available? Let's assume we have X MB of memory available.



1. Divide the file into K chunks, where K*X MB = 2 GB. Bring each chunk into memory and sort the lines as usual using any O(n log n) algorithm. Save the lines back to the file.
2. Now bring the next chunk into memory and sort.
3. Once we're done, merge them one by one.

The above algorithm is also known as external sort.
Step3 is knows as N-way merge

The rationale behind using external sort is the size of data.
Since the data is too huge and we can't bring it all into memory, we need to go for a disk based sorting algorithm.

3. Similar Ones

No comments:

Post a Comment