mcwhirter.com.au/craige/blog/2014/A Script for Broken Links

The Scenario:

Imagine that there's been a tumultuous cut over from one website to another that's left you with over 6,000+ broken links ruining your search engine foo. As a sysadmin you can't fix the missing content but you can bring some peace to those 404s.

You programatically converted a report of those 6,000+ URLs into an Apache redirect file, reloaded Apache tested a couple of redirects manually and they were fine but you're not going to manually test those 6,000+ URLs are you? (the answer's "no", in case you're unsure).

My Nasty Hack Solution

I wrote broken_links.sh to address testing workload (available from my git repo or via github). When run, broken_links.sh asks for where the file of raw URLs is and where you would like any links that are still broken saved to.

Sample Input File of URLs to be Tested

$ cat /tmp/URLs.txt
https://mcwhirter.com.au/
http://www.debian.org/
http://mcwhirter.com.au/broken.html

When you run broken_links.sh, your output will be similar to the below:

$ ./broken_links.sh
Enter the path and file name that contains the broken URLs:
/tmp/URLs.txt
Enter the full or relative path & file name for the list of still broken URLs:
/tmp/broken.txt
Now checking: https://mcwhirter.com.au/
Now checking: http://www.debian.org/
Now checking: http://mcwhirter.com.au.au/broken.html

Total URLs in the file: 3
Total URLs still broken: 1 /tmp/broken.txt

The output to the requested broken.txt looks like this:

http://mcwhirter.com.au.au/broken.html

Happy hunting still broken URLs :-)