How To Compare Files And Verify They Are Identical

If you need to ensure that the files on your back up device( such as an USB drive, an Amazon S3 account ) are identical to the ones on your Windows PC( such as Windows 7, Windows 10 ) or Linux Server( such as Redhat, CentOS ), then there are several ways to accomplish this task.

1. Using a Hash generator

Since each different files have their own hashes, if two files have the same hash value, then they are identical.

There are many MD5 or SHA1 hash generators on the Web, most of them are free. However not all of them are good ones. The one I have used several years is named MD5Summer, it works on Windows XP and Windows 7, supports both 32-bit and 64-bit systems.

MD5Summer enables to create MD5 and SHA1 checksums for all the files and subfolders inside the directory you specified, and save the result in a plain text file. The result contains the hash value and file name for every files.

After you’ve generated the hash values for a folder, you can use the output text file to verify another folder to see whether they are identical.

MD5Summer is easy to use and totally free. I’ve added it to the system path of my computer, then when I run “MD5″ command with the “Win” + “R” shortcut, Windows will bring up MD5Summer.

2. Use a file comparing program

This type of programs usually compare your files in two mode: text mode or binary mode, and can list the difference between two text files( such as .txt files and source code files written in PHP, Python, C++, Ruby, Swift, JAVA, Javascript languages ), and show whether two binary files are different or not.

One of them is KDiff3, the shortcoming of this type of software is that, they only compare files on the fly, and don’t save the hash values to your disk, thus you can’t use them after a few weeks to know whether the files inside a folder had changed or not.

Another shortcoming of it is that, you can’t compare files on two computers without copying them to a single computer. Using the hash generator, you can use the output file that contains hash values of files on one computer, to verify the files on another computer, you only need to copy a single file. If there are tens millions of files, the first method should be better.

3. Using a Linux bash script

If you’ve copied your files from one server to another with some command like rsync, you may need to know whether those files are copied correctly.

In such case, you can write a simple script to generate the MD5 or SHA1 hashes of the files.

Suppose that the current directory is /home/sites/public_html/myappmag/, then you can generate all of the files under this folder to a text file with this command:

find . -type f -exec sha1sum {} + | sort > sha1_output.txt

This command will find all the files under current folder, generate the sha1 checksums of them, and sort the result, then write to a text file named sha1_output.txt.

In this way, you can verify that the back up files are identical to the original ones. If you need to move your server provider to a new one, such as Linode, DO( Digital Ocean ), Liquid Web, Softlayer, Leaseweb, Hostgator, Godaddy, etc, then I think this script may help you.

5 Best MD5 Hash Generator To Check MD5 Checksum

Since the MD5 signature of different files are not equal, it can be used to verify the integrity of any file, find the duplicate documents, music files, photographs, videos, spreadsheets, presentations or any other stuffs on your drive, verify your files are copied correctly after a backup or synchronize job by comparing the MD5 hashes of all files in two folders. Continue reading