Creating an HTML Index Using Python

 

Introduction

“Does anyone know how I can generate an index of files in a directory to include in my case report to a client?” -Literally all of us at some point

I’ve seen this question popup dozens of times on DFIR listservs (and even on DFIR Discord) over the past 16 years I’ve been in this field. Many have offered up some open-source options like dirhtml & SnapHTML, both of which will do the job. But, these options are Windows-exclusive, non-customizable (no public source code), and brands your html file with their logo and/or toolmark. For these reasons alone I have tended to gravitate away from these tools in my own cases and case reports. I’ve also seen others suggest various python and bash scripts out there, but they either don’t link to the script or just half-heartedly suggest “well you can build your own bash or python script.” As any busy DFIR analyst would say “aint nobody got time for that!”

Years ago, while still an analyst in the digital forensics laboratory, I had picked up this project with every intention to knock this out in a tidy little Python script once and for all. We had built some simple little html templates that were used in our reports, but they focused more on harmonizing our tool reports with our case reports–nothing more. We used the life out of them and, to my knowledge, the lab still uses them to this day. Life and work happens and wouldn’t you know it got put on the back burner (along with my other DFIR projects). This past summer I contributed to and did a little bit more tinkering with it, but didn’t feel it to be ready to fully there just yet.

A few weeks ago, someone posted to the Forensic Video Analysis (FVA) listserve (which I highly recommend joining if you are eligible) asking the very question of locating a resource to generate such an index.html for their case report. This prompted me to sit down into the developer chair and polish something up to get out to the community… After some coding and tweaking, I came up with something….

indexer.py

https://github.com/joshbrunty/Indexer

Image

Indexer.py is quite simple in theory. Indexer is a Python script that generates an .html index of files within a selected directory. You can start from the current directory or from folder passed as first positional argument. Optionally filter by file types with –filter “*.py”. This script is an hodgepodge of older scripts collected & built over the years. This script requires you to have Python > 3.x.x installed on your system. Ideally, though, you’d want to use the current build of Python 3.10 (significantly faster processing on larger directories!🏃). You can create a custom output file if you wish (by default *index.html is generated). You can also match/filter specified parameters using glob (i.e. by setting *’*/.jpg’ & ‘*/.UFD’ parameters). File Size & Modified Time is also displayed for each file (although you can but that our of the provided .py script if you don’t want that in there).

Usage

Usage is pretty easy. As long as you have Python3 installed on Windows, Linux, or Mac you just simply download the script and run the following:

indexer.py /top_dir

Positional Arguments

As outlined above, top_dir was specified, although you can specify any working directory you wish. However, if you don’t include this or choose not to the current working directory is used:

  • top_dir: top folder from which to start generating indexes (uses current working directory/folder if not specified).

Optional Arguments

Here are other optional arguments that you can utlize within Indexer.py:

  • -h, --help: Shows help message (along with quick index of available optional arguments).
  • -f, --filter: only include files matching glob (i.e.indexer.py --filter '\**/*.jpg'*).
  • -o filename, --output-file filename: Custom output file (by default generates “index.html”)
  • -r, --recursive: recursively process nested folders/directories (Off/False by default).
  • -v, --verbose: verbosely list every processed file. (NOTE: will take longer time with complex file tree structures on slow terminals.*)

Conclusion

Short, sweet, open-source, and easy to use and customize. Feel free to pull it down and give it a shot. Also, feel free to contribute to it as well. This script doesn’t belong to me nor do I want it to. This belongs to the DFIR community as a whole. Enjoy!

Video Tutorial