DupeTrasher - Benchmark - Comparing with other products
We have tried two very popular commercial applications, that have the feature to search duplicate files, and we compared their performance with the DupeTrasher. We also wanted to include few other similar products, but due to limitations in their trial versions (for example, only limited number of files can be compared) we couldn't do that.
Test has been taken on the following system:
||AMD Athlon XP 1700+
||256MB DDR 333MHz
||Windows XP with Service Pack 1
||Number of files searched
||Overall files size
Files will be compared and considered duplicates if they have same name and the same size. It should be noted that each application was tested under equal conditions, that is for every test system was restarted in order to flush file's cache from previous search.
Parameters we have tested:
- Memory usage - this is how much physical memory program needs in order to complete the operation.
- Search time - this is time needed to search hard disk and take information from the files which is used later in comparing.
- Compare time - this is time needed to compare, sort and show the result of the duplicate files search.
Out of these three parameters, the third one 'Compare time', is the parameter that actually shows the performance of the application and the ability to do fast calculations, comparing and sorting. Search time is solely based and limited on the hard disk speed, which as we know, the slowest component in the computer system. Because of that all applications should have similar search time, with small deviations. The general rule for each testing parameter is - the lower is better.
Step 1: Memory usage testing
Comment: Memory usage chart shows that Product A has the biggest impact on the system resources in this test. Memory needed for 'Product A'search was way above the available memory on the system, thus Windows was forced to swap, which in turn dramatically slowed the overall operation. We took that memory impact was 300mb for the record, because we were unable to correctly calculate it due to swapping. We assume that memory usage was actually more than a twice of the available RAM, but even this is pretty bad result for such a popular application.
On the other hand 'Product B' has very well optimized search routine which took only about 44MB. This is really good result considering that the application was written is some high level language such as C++ or Delphi. Considering the number of files and the amount of information retrieved from them, this is probably very close to the maximum of the high level languages. DupeTrasher has the best result here, 38MB.
Conclusion: 'Product A' is not even worth to mention in this segment, while other two have very low impact on the system memory, with the 15% advantage on the DupeTrasher side.
Step 2: Search time testing
Comment: As already said previously, search time is supposed to be similar to all applications because it mostly depends on the hard drive performance. Even the fastest algorithm will be limited by the HDD speed, so theoretically the best thing we can do is to have search time equal to the time needed for the hard disk to do the search. Again, 'Product A' is way out of the competition here. It needs more than a twice of the time of the 'Product B' and almost three times more than the DupeTrasher, just to do the same job. 'Product B' here again shows its good performance, but the DupeTrasher beats it again with about 40% better performance.
Step 3: Compare time testing
Comment: This is the part of the test which best shows the program performances. 'Product A' gives the poorest performance, as we have already got used to. It's time is actually much more than 30 minutes but here we rounded it on 30 for the chart clarity. (Actually we didn't have patient to wait 'Product A' to finish the job so we stopped it after about 40 minutes). We also got used that 'Product B' has near DupeTrasher's performances so far, but in this step this simply does not apply.
But don't underestimate 'Product A' result of 1 minute and 18 seconds, this is very good result when you have to compare and sort 280.000 files in memory. Of course, it could probably be better, but could it be better than DupeTrasher's 2 seconds...? Well that's difficult question... We think we have accomplished maximum performance here. This program section is dependable solely on the program design, algorithm developed, optimization techniques, time invested and of course programmer's skills.
Conclusion: When there is no limitation of the slow hardware components, like the hard disk in the 'Search Time' test, the only limit is the processor and memory speed which we tried to utilize to the maximum here. DupeTrasher and assembly code optimizations simply show their superiority in this section of the test.
Overall operation time
Comment: This is overall operation time needed for each application to do the same job. That is to search 80GB hard disk, load information about 280.000 files into memory, do the calculations, sorting and comparing, and finally to show the result. No detailed comment is needed here.
Q: Why we didn't include "compare by contents" option?
A: It is true that this option is most reliable when searching for duplicated files. However, it is also very dependable on the hard disk performance (like the step 2). Reading big files from the hard disk is much slower than comparing them in memory. So beside that this option would dramatically prolong the test time, it would probably show the similar results as in the step 2.
Q: Why don't you want to name products used in this test?
A: We do not want to make any anti-campaign on the behalf of other products, companies or individuals. Products taken for this test were solely chosen because of their popularity and duplicate files search feature. It doesn't mean, however, that they are bad products, since we were testing only one feature from them. We also think that customers are able to recognize and distinguish quality products offered by fair prices from those created on-the-run mostly for the sake of profit. So in that manner, there is no really need to name them whatsoever.
If you have some critics or feedback regarding this test, or if you have found similar utility that is faster than ours, please do let us know. We will be glad to hear what you have to say.
Click to go back