Watching a Folder
At some point in every programmer’s life, they discover that the operating system will tell them when a file changes. This knowledge produces a particular kind of enthusiasm — the automation that triggers automatically, no cron job, no polling, no manual step. Just drop the file and the thing happens.
The enthusiasm is not wrong. File system watching is genuinely useful. But it has a character that rewards caution.
How It Goes Wrong
The first version usually looks like this in Python, using watchdog:
from watchdog.observers import Observerfrom watchdog.events import FileSystemEventHandlerimport time
class Handler(FileSystemEventHandler): def on_created(self, event): if not event.is_directory: process(event.src_path)
observer = Observer()observer.schedule(Handler(), path="./incoming", recursive=False)observer.start()
try: while True: time.sleep(1)except KeyboardInterrupt: observer.stop()observer.join()This works. It also fires on_created twice on some operating systems for some file types, fires immediately before a file has finished writing, fires during intermediate save states from editors that write to a temp file and rename, and does not tell you what happened if process() fails.
None of these are edge cases. They are the normal behavior of the file system event API on macOS and Linux.
Making It Survivable
The adjustments that matter most are not clever. They are defensive.
Wait before acting
A file that just appeared may not be finished writing. A short sleep before processing catches most cases:
import time
def on_created(self, event): if event.is_directory: return # Give the writing process a moment to finish. time.sleep(0.5) process(event.src_path)This is not a complete solution. A 500ms wait fails for large files or slow writers. The correct solution is to check that the file size is stable:
import osimport time
def wait_for_stable(path: str, interval: float = 0.2, retries: int = 10) -> bool: """Return True when the file size has not changed for two consecutive checks.""" prev = -1 for _ in range(retries): try: current = os.path.getsize(path) except OSError: return False if current == prev: return True prev = current time.sleep(interval) return FalseThis is more code than the alternative, but it handles large files and slow writers.
Deduplicate events
If you are getting duplicate events, keep a set of recently-seen paths and ignore repeats within a short window:
import timefrom collections import defaultdict
class Handler(FileSystemEventHandler): def __init__(self): self._seen: dict[str, float] = {} self._window = 1.0 # seconds
def on_created(self, event): if event.is_directory: return path = event.src_path now = time.monotonic() if now - self._seen.get(path, 0) < self._window: return self._seen[path] = now process(path)Log everything
When the watcher runs unattended, you will eventually need to know what it did and when. Even a simple log file with timestamps is worth the three lines it takes:
import logging
logging.basicConfig( filename="watcher.log", level=logging.INFO, format="%(asctime)s %(levelname)s %(message)s",)
def process(path: str) -> None: logging.info("Processing %s", path) # ... logging.info("Done: %s", path)When to Use a Cron Job Instead
File system watching is the right tool when:
- The trigger is genuinely file-arrival (a camera upload, an export from another tool)
- The processing window matters (you want it to happen within seconds of arrival)
- The volume is low enough that duplicate-event handling is not a serious burden
A cron job is often simpler when:
- You just want something to run every N minutes regardless of file changes
- The “watch” is really polling a remote source anyway
- You need the job to run whether or not there are new files
The overhead of a watcher setup — the daemon, the deduplication logic, the stability check — is only worth paying if the event-driven behavior is genuinely what you need.
A useful summary from experience: reach for watchdog (or equivalent) when the file system event is the right abstraction. Reach for cron when you are retrofitting event-driven behavior onto something that is really just periodic.
More on the properties that make scripts like these survive long-term in A Script That Stays Used.