![]() Here is a commandline to use: c:\downloads> md5sum.exe C:\temp\tiffs\*.tif | sortĪs a result you should easily see which files/pages have the same MD5 hash. Third and Fourth: Run md5sum.exe and sort the list of files to find duplicates. ![]() Now loop through all your TIFFs to normalize all their DateTime fields: c:\downloads> for /l %i in (C:\temp\tiffs\*.tif) ^ĭo tiffset -s 306 "0000:00:00 00:00:00" %i Here is the command to "normalize" the date+time fields (which are tagged "306" in my case) in an example TIFF: c:\downloads> tiffset -s 306 "0000:00:00 00:00:00" ex001.tifĪs a result, the DateTime field now has changed: c:\pa>tiffdump ex001.tif | findstr DateTime like this: c:\downloads> tiffdump.exe page-000001.tifĭirectory 0: offset 2814 (0xafe) next 0 (0) Use tiffinfo page-000001.tif or tiffdump page-000001.tif to see what I mean. This could botch your MD5 checking, because otherwise identical TIFFs may carry a different date/time stamp. When Ghostscript creates a TIFF page, it will note its current version, date and time plus some other meta data inside the TIFF. Second: Some notes on the requirement of using (the freely available) libtiff utilities. # page-%06d.tif creates TIFFs named page-000001.tif through page-012000.tif* # use -sDEVICE=jpeg to create *.jpeg files + adapt -sOutputFile= accordingly sOutputFile=C:\temp\tiffs\page-%06d.tif ^ Create your 1200 TIFF (or JPEG) pages (on Linux you'd use gs instead of gswin32c): gswin32c.exe ^ You could code this algorithm in any language you like (even batch on Windows or bash on Linux/Unix/MacOSX).įirst: Some notes on using Ghostscript. Run a pdftk.exe commandline on the original PDF to remove the duplicates.Remember all duplicate page numbers to be deleted. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |