fdupes Tutorial
February 18, 2007
fdupes seems to have done the job of getting rid of the horrible duplicate mess I’ve managed to get myself into, but glancing through the duplicates list I may have rendered some configuration settings (like the ones in ~/.mozilla) unusable. That’s not a real big problem though since I planned to start those over mostly from scratch anyway.
The only real problems I have with fdupes is that it doesn’t detect whole directory trees (which might have made the job of removing the files a little quicker) so there are a couple of empty directories floating around. I also wished it had an option to be more verbose and print the MD5SUM along with the results. (just to releive a little bit of paranoia)
To use start the removal process run fdupes with the -r option. (fdupes -r $somedir) This will write the list of duplicate files to the console which is probably not what you want. I like using a pipe to tee so I can see the progress and have a list of files made. (fdupes -r $somedir | tee $filelist)
The results will have all duplicates separated in groups of the same file, with groups separated by a blank line.
Now to make a list of files to remove run fdupes -rf $somedir | tee $filelist-omit-first. The added -f flag will tell fdupes to omit the first match from the output. (this should leave one copy of a file in a group of duplicates so at least 1 copy will be preserved)
Use your favorite diff program (I recommend vimdiff) to compare the first file list you made to the second as a precaution against deleting something you don’t want. Once that’s all done it should be safe to remove all the files in the second file list (the one made from fdupes -rf)
Do some last bit cleanup on the file list by sorting, eliminating matching lines, and blank lines. (sort $filelist-from-rf | uniq | grep -v '^$' > $removelist)
finally you are ready to remove the duplicates. (while read file; do rm -v "$file"; done )
Good-bye duplicates!
Entry Filed under: Linux. Tags: Command Line, Learning, Linux, Shell, Shell Scripting, software.
14 Comments Add your own
Leave a Comment
Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <pre> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
Trackback this post | Subscribe to the comments via RSS Feed
1.
Ryan Sinn | March 2, 2007 at 1:22 pm
I think your final line to remove the duplocates isn’t any good because you don’t specify the file to read from…
This works –
for file in `cat $removelist`; do echo $file; done
Where $removelist is the file list to be deleted and $file is actually a variable that references the “file” at the beginning of the line… for file
Great post / help though
2.
Ryan Sinn | March 2, 2007 at 1:26 pm
whoops
Instead of do echo $file … you need to change that to rm -v when you have the confidence required to actually delete your files
for file in `cat $removelist`; do rm -v “$file”; done
I had been using the new alpha release of Mozilla Thunderbird and when I merged a few dozen mail folders containing over 8000 emails Thunderbird decided to make 20+ duplicates of over 1000 of those… YUCK! 39000 emails in a single folder… luckily I’m on linux and I can edit my Maildir by hand
3.
dosnlinux | March 2, 2007 at 7:41 pm
Thanks for pointing that out.
I forgot the file redirect after
doneThe last line should bewhile read file; do rm -v "$file"; done < $removelistGlad you found the post helpful
4.
Paul | October 2, 2007 at 10:27 am
The last step doesn’t seem to work for files with spaces in the names. I’m getting output like:
rm: cannot remove `iTunes/Kansas/Dust': No such file or directory
rm: cannot remove `In': No such file or directory
rm: cannot remove `The': No such file or directory
rm: cannot remove `Wind.mp3': No such file or directory
5.
Paul | October 2, 2007 at 10:29 am
Aha, I switched to bash and now all is OK.
6.
Flightlessbird | October 13, 2007 at 7:10 am
Fantastic help – even a total newbie like me could follow it. Very useful.
7. fdupes Tutorial @ โหน่ง : N^o^NG | December 9, 2007 at 8:35 am
[...] http://dosnlinux.wordpress.com/2007/02/18/fdupes-tutorial/ [...]
8.
Ahsan Ali | May 30, 2008 at 7:34 am
Thanks, this was very useful even all the way over here in Pakistan.
9. Recursive copy to a RANDOM name - Page 2 - UNIX for Dummies Questions & Answers - The UNIX and Linux Forums | October 23, 2008 at 5:19 am
[...] mind that the file paths/names had difficult characters. The following fdupes tutuial was useful: fdupes Tutorial Life at the CLI I’ve double checked the results with variations on the following command: find . -type f -exec [...]
10.
James | December 23, 2008 at 7:47 pm
A little unclear. The sort command doesn’t specify which filelists should be passed as arguments (previous steps mention two filelists: the original -rf one, the one with first entries omitted). The while loop doesn’t mention which of the three filelists to pass, and where. Ryan’s Comment 1 defines variables as a tutorial should. Last, the while loop command mentioned in the tutorial is still incorrect lacking “< $removelist”. There are also various typos, which are annoying (“use start”).
11.
Sergio | February 4, 2009 at 7:02 am
This is a nice approach:
b=""fdupes tmp/ --recurse | \
while read f
do
if [ "$f" = "" ]
then
b=”"
else
if [ "$b" = "" ]
then
b=”$f”
else
rm “$f” && echo “Removed \”$f\”"
fi
fi
done
(Source: http://www.miriamruiz.es/weblog/?p=79 )
12.
James McGill | April 18, 2009 at 12:36 pm
I’m on the ambitious task of finding and removing duplicates against an 8 terabyte network attached storage.
Even breaking it down into reasonably sized chunks, fdupes takes days (maybe weeks, we don’t know yet) to run, but it doesn’t overgrow in memory when presented with huge numbers of files.
(That’s what I was afraid it would do, and fslint choked on our filesystem.)
13.
none | September 25, 2009 at 8:28 am
you can pipe the list of files into the iterator:
cat dupes.txt | while read line;do rm -v “$line”;done
14.
none | October 16, 2009 at 12:58 pm
you’ll need to adjust the quotes in my example above since the blog soft replaced them