Thursday, January 21, 2010

Renaming parts of files in bulk.

So I had an issue where I needed to change a dash '-' to an underscore '_' in the middle 20,000 files, but keep the dash in another part of the file.  An average file name looked something like this: 24-A13-smm1.png where I wanted to change the second dash to an underscore to better separate plate-well info from the library type.  (Skip to the bottom for the quick answer.)

I came across several solutions, but most of them were unintuitive, and hard to follow.  However, without too much digging I finally came across a QnA form with a some more elegant solutions.  Here is my favorite for just changing all instances of a single character:

Review the offending files.
find /path/to/files -name '*-*' -print

Rename the offending files with an underscore.
find /path/to/files -name '*-*' -exec rename '-' '_' {} +

In my particular case I could leave out the /path/to/files because I was executing the script from the directory the files were stored in.  The script therefore is shortened to:

find -name '*-*' -exec rename '-' '_' {} + 

Because I want to remove the second dash, I need to use a bit of trickery.  Fortunately, all of my files end in "smm*.png"  so I can just search for the -smm to change that specific dash to an underscore.  The following worked perfectly for me:


find -name '*-smm*' -exec rename '-smm' '_smm' {} + 


Now that I have my solution, let's break down the script:
find -name '*-*' -exec rename '-' '_' {} + 



The find -name '*-*' obviously finds any file with an dash.

From what I've read -exec is a quite powerful, theoretically being able to execute any Unix command.  Therefore you should always test your query first.

The rename '-' '_' part is again more trivial as the first argument following rename is the text you want to change and the second argument is what you want to change it to.
The {} argument represents what would normally be the name of the file in your executed command. 

What the + does at the end, I'm not really certain, I think the writer of the script used it in place of a ; because in a shell script you would need to escape ( \; or '\' ).

Now that it's been broken down, and I know that rename is just a common UNIX command, I see that my script could be further simplified.  Since I want to rename every file with a - in my directory, I don't actually need to used the find command just rename!!  And sure enough

rename '-smm' '_smm' *
 

works just fine!

No comments:

Post a Comment