Keep Your Domain Controllers Healthy With This PowerShell Script

Your domain controller is like the heart of your network. When it’s working, nobody thinks about it. When it starts having problems, everyone knows immediately because nothing works right. I’ve been down that road more times than I care to admit, and I learned the hard way that waiting for users to report authentication issues is not a monitoring strategy.

That’s why I put together this PowerShell health check script. It covers the essentials: CPU and memory usage, disk space, AD replication status, recent error events, critical services, DNS health, and basic network connectivity. Run it daily, weekly, or whenever you get that nagging feeling something might be off.

What This Script Actually Checks

The script breaks down into seven main areas, each targeting the stuff that actually matters when your DC starts acting up.

System Resources

It pulls CPU load and memory usage using the newer CIM cmdlets instead of the old WMI calls. The thresholds are straightforward: over 90% gets flagged as an error, over 75% gets a warning. These aren’t arbitrary numbers; they’re based on watching servers struggle when they hit those levels consistently.

Disk Space

Nothing kills a domain controller faster than running out of disk space, especially on the drive hosting your SYSVOL or database files. The script checks all local drives and flags anything under 20% free space as a warning, under 10% as critical. Trust me, you don’t want to be scrambling to free up space when your DC stops accepting updates.

AD Replication Health

This is where things get interesting. The script uses Get-ADReplicationFailure to check for any replication issues across the forest, then runs repadmin /replsummary for additional detail. Replication problems have a nasty habit of cascading, so catching them early saves you from those fun Monday morning calls about users who can’t log in at certain sites.

Event Log Analysis

Instead of drowning you in every warning from the last month, it focuses on critical and error events from the past 24 hours in the System log. That’s usually where you’ll find the smoking gun when something’s going sideways with hardware, services, or core Windows functionality.

Essential Services

The script monitors the services that absolutely have to be running: NTDS, DNS, W32Time, Netlogon, KDC, ADWS, and LanmanServer. If any of these are stopped, your DC is probably not doing its job properly. I’ve seen too many mysterious authentication issues that boiled down to the KDC service being stuck in a stopped state.

DNS Health

Since your DC is almost certainly also your DNS server, the script tests resolution for your domain and runs a quick dcdiag DNS test. DNS problems masquerade as all sorts of other issues, so this catches a lot of weird behavior before it becomes user-facing.

Network Connectivity

Basic ping tests to your domain and default gateway. Simple, but effective at catching network issues that might affect replication or client communication.

Running and Customizing the Script

The script defaults to saving reports in C:StuffDCHealthCheckReport.txt, but you can specify a different path with the -LogFile parameter. The output goes both to the console with color coding and to the log file for later review.

.DCHealthCheck.ps1 -LogFile "D:LogsDCHealth_$(Get-Date -Format 'yyyyMMdd').txt"

The color coding makes it easy to spot problems at a glance: green for OK, yellow for warnings, red for errors, and white for informational messages. The log file captures everything in a format that’s easy to review later or send to someone else if you need to troubleshoot.

What Makes This Different

I’ve seen plenty of DC health check scripts over the years, but most of them either check too little or dump so much information that you can’t find the actual problems. This one hits the sweet spot of being comprehensive without being overwhelming.

The error handling is solid; it won’t crash if a particular check fails, and it tells you what went wrong instead of just silently skipping things. The logging is structured so you can actually read it later, and the thresholds are based on real-world experience rather than theoretical ideals.

The Practical Reality

Set this up to run automatically via Task Scheduler, but don’t just fire and forget. Review the reports regularly, especially after any changes to your environment. Look for trends over time, not just individual bad days. A DC that’s consistently hitting 80% CPU usage probably needs attention even if it’s not technically in the “error” range yet.

Keep the reports around for at least a few weeks. When something does go wrong, being able to look back and see when a trend started is invaluable for root cause analysis.

This script won’t catch every possible problem with a domain controller, but it’ll catch the common ones that cause the most pain. Sometimes the best monitoring is the kind that just works reliably and tells you what you need to know without requiring a PhD in Active Directory to interpret.

# DC Health Check Script
# Checks CPU/Memory, Disk, AD Replication, Event Logs, Services, DNS, and Network
# Updated: WMI -> CIM, fixed AD replication, proper event filtering, error handling

param (
    [string]$LogFile = "C:\Stuff\DCHealthCheckReport.txt"
)

# ── HELPERS ──────────────────────────────────────────────────────────────────

function Write-Section {
    param ([string]$Title)
    $timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
    $line = "=" * 60
    $header = @"

$line
  $Title
  Checked: $timestamp
$line
"@
    Write-Host $header -ForegroundColor Cyan
    $header | Out-File $LogFile -Append
}

function Write-Log {
    param (
        [string]$Message,
        [string]$Level = "INFO"   # INFO | WARN | ERROR | OK
    )
    $colors = @{ INFO = "White"; WARN = "Yellow"; ERROR = "Red"; OK = "Green" }
    $prefix = "[$Level]".PadRight(8)
    Write-Host "  $prefix $Message" -ForegroundColor $colors[$Level]
    "  $prefix $Message" | Out-File $LogFile -Append
}

# ── ENSURE LOG DIRECTORY EXISTS ───────────────────────────────────────────────

$logDir = Split-Path $LogFile
if (-not (Test-Path $logDir)) {
    New-Item -ItemType Directory -Path $logDir -Force | Out-Null
}

# Write report header
$now = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
$reportHeader = @"
╔════════════════════════════════════════════════════════════╗
║         DOMAIN CONTROLLER HEALTH CHECK REPORT              ║
║         Generated: $now
╚════════════════════════════════════════════════════════════╝
"@
Write-Host $reportHeader -ForegroundColor Cyan
$reportHeader | Out-File $LogFile  # Note: no -Append here, starts fresh each run

# ── CPU & MEMORY ─────────────────────────────────────────────────────────────

function Check-CPUandMemoryUsage {
    Write-Section "CPU & MEMORY USAGE"
    try {
        $cpu = Get-CimInstance Win32_Processor |
            Measure-Object -Property LoadPercentage -Average |
            Select-Object -ExpandProperty Average

        $os          = Get-CimInstance Win32_OperatingSystem
        $totalMemGB  = [math]::Round($os.TotalVisibleMemorySize / 1MB, 2)
        $freeMemGB   = [math]::Round($os.FreePhysicalMemory / 1MB, 2)
        $usedMemGB   = [math]::Round($totalMemGB - $freeMemGB, 2)
        $percentUsed = [math]::Round(($usedMemGB / $totalMemGB) * 100, 2)

        $cpuLevel  = if ($cpu -gt 90) { "ERROR" } elseif ($cpu -gt 75) { "WARN" } else { "OK" }
        $memLevel  = if ($percentUsed -gt 90) { "ERROR" } elseif ($percentUsed -gt 75) { "WARN" } else { "OK" }

        Write-Log "CPU Load       : $cpu%" $cpuLevel
        Write-Log "Memory Used    : $usedMemGB GB / $totalMemGB GB ($percentUsed%)" $memLevel
    }
    catch {
        Write-Log "Failed to retrieve CPU/Memory data: $($_.Exception.Message)" "ERROR"
    }
}

# ── DISK SPACE ───────────────────────────────────────────────────────────────

function Check-DiskSpace {
    Write-Section "DISK SPACE"
    try {
        $drives = Get-CimInstance Win32_LogicalDisk -Filter "DriveType=3"
        foreach ($drive in $drives) {
            $freeGB     = [math]::Round($drive.FreeSpace / 1GB, 2)
            $totalGB    = [math]::Round($drive.Size / 1GB, 2)
            $percentFree = [math]::Round(($freeGB / $totalGB) * 100, 2)

            $level = if ($percentFree -lt 10) { "ERROR" } elseif ($percentFree -lt 20) { "WARN" } else { "OK" }
            Write-Log "Drive $($drive.DeviceID)  Total: $totalGB GB  Free: $freeGB GB ($percentFree% free)" $level
        }
    }
    catch {
        Write-Log "Failed to retrieve disk space data: $($_.Exception.Message)" "ERROR"
    }
}

# ── AD REPLICATION ───────────────────────────────────────────────────────────

function Check-ADReplication {
    Write-Section "ACTIVE DIRECTORY REPLICATION"
    try {
        # Check for any replication failures across the forest
        $failures = Get-ADReplicationFailure -Scope Forest -ErrorAction Stop

        if ($failures) {
            Write-Log "Replication failures detected:" "WARN"
            foreach ($failure in $failures) {
                Write-Log "  Partner    : $($failure.Partner)" "WARN"
                Write-Log "  Last Error : $($failure.LastError)" "WARN"
                Write-Log "  Fail Count : $($failure.FailureCount)" "WARN"
                Write-Log "  First Fail : $($failure.FirstFailureTime)" "WARN"
            }
        } else {
            Write-Log "No replication failures found." "OK"
        }

        # Pull repadmin summary for additional detail
        Write-Log "Running repadmin /replsummary..." "INFO"
        $replSummary = repadmin /replsummary 2>&1
        foreach ($line in $replSummary) {
            if ($line.Trim() -ne "") {
                "  $line" | Out-File $LogFile -Append
            }
        }
    }
    catch {
        Write-Log "Failed to retrieve AD replication data: $($_.Exception.Message)" "ERROR"
        Write-Log "Ensure the ActiveDirectory module is installed and this is run on a DC." "WARN"
    }
}

# ── EVENT LOGS ───────────────────────────────────────────────────────────────

function Check-EventLogs {
    Write-Section "SYSTEM EVENT LOG (Last 24 Hours - Errors & Critical)"
    try {
        $events = Get-WinEvent -FilterHashtable @{
            LogName   = 'System'
            Level     = 1, 2        # 1 = Critical, 2 = Error
            StartTime = (Get-Date).AddHours(-24)
        } -ErrorAction Stop

        if ($events) {
            Write-Log "$($events.Count) error/critical event(s) found in the last 24 hours." "WARN"
            foreach ($event in $events) {
                $shortMsg = ($event.Message -split "`n")[0].Trim()  # First line only — full messages are noisy
                Write-Log "$($event.TimeCreated)  ID:$($event.Id)  $shortMsg" "ERROR"
            }
        } else {
            Write-Log "No critical or error events in the last 24 hours." "OK"
        }
    }
    catch [System.Exception] {
        if ($_.Exception.Message -like "*No events were found*") {
            Write-Log "No critical or error events in the last 24 hours." "OK"
        } else {
            Write-Log "Failed to retrieve event log data: $($_.Exception.Message)" "ERROR"
        }
    }
}

# ── ESSENTIAL SERVICES ───────────────────────────────────────────────────────

function Check-EssentialServices {
    Write-Section "ESSENTIAL SERVICES"
    try {
        # Core DC services — added KDC, ADWS, LanmanServer vs original
        $services = @(
            @{ Name = "NTDS";        Display = "AD Domain Services" },
            @{ Name = "DNS";         Display = "DNS Server" },
            @{ Name = "W32Time";     Display = "Windows Time" },
            @{ Name = "Netlogon";    Display = "Net Logon" },
            @{ Name = "KDC";         Display = "Kerberos Key Distribution" },
            @{ Name = "ADWS";        Display = "AD Web Services" },
            @{ Name = "LanmanServer";Display = "Server (File/Print Sharing)" }
        )

        foreach ($svc in $services) {
            $serviceObj = Get-Service -Name $svc.Name -ErrorAction SilentlyContinue
            if ($serviceObj) {
                $level = if ($serviceObj.Status -eq "Running") { "OK" } else { "ERROR" }
                Write-Log "$($svc.Display.PadRight(35)) : $($serviceObj.Status)" $level
            } else {
                Write-Log "$($svc.Display.PadRight(35)) : Not Found" "WARN"
            }
        }
    }
    catch {
        Write-Log "Failed to retrieve service status: $($_.Exception.Message)" "ERROR"
    }
}

# ── DNS HEALTH ───────────────────────────────────────────────────────────────

function Check-DNSHealth {
    Write-Section "DNS HEALTH"
    try {
        $domain = (Get-CimInstance Win32_ComputerSystem).Domain

        $dnsResult = Resolve-DnsName -Name $domain -Type A -ErrorAction Stop
        if ($dnsResult) {
            Write-Log "DNS resolution for '$domain' succeeded." "OK"
            foreach ($record in $dnsResult) {
                if ($record.IPAddress) {
                    Write-Log "  Resolved IP : $($record.IPAddress)" "INFO"
                }
            }
        }
    }
    catch {
        Write-Log "DNS resolution failed: $($_.Exception.Message)" "ERROR"
    }

    # Also run dcdiag DNS test if available
    try {
        Write-Log "Running dcdiag /test:dns (summary only)..." "INFO"
        $dcdiag = dcdiag /test:dns 2>&1
        $failLines = $dcdiag | Where-Object { $_ -match "FAIL|error|warning" }
        if ($failLines) {
            foreach ($line in $failLines) {
                Write-Log $line.Trim() "WARN"
            }
        } else {
            Write-Log "dcdiag DNS test passed." "OK"
        }
    }
    catch {
        Write-Log "dcdiag not available or failed: $($_.Exception.Message)" "WARN"
    }
}

# ── NETWORK STATUS ───────────────────────────────────────────────────────────

function Check-NetworkStatus {
    Write-Section "NETWORK STATUS"
    try {
        $domain = (Get-CimInstance Win32_ComputerSystem).Domain

        $pingResult = Test-Connection -ComputerName $domain -Count 2 -Quiet -ErrorAction Stop
        if ($pingResult) {
            Write-Log "Ping to '$domain' succeeded." "OK"
        } else {
            Write-Log "Ping to '$domain' failed." "ERROR"
        }

        # Also check default gateway reachability
        $gateway = (Get-CimInstance Win32_NetworkAdapterConfiguration |
            Where-Object { $_.IPEnabled -and $_.DefaultIPGateway } |
            Select-Object -First 1).DefaultIPGateway[0]

        if ($gateway) {
            $gwPing = Test-Connection -ComputerName $gateway -Count 2 -Quiet -ErrorAction SilentlyContinue
            $level  = if ($gwPing) { "OK" } else { "ERROR" }
            Write-Log "Ping to gateway ($gateway): $(if ($gwPing) {'Reachable'} else {'Unreachable'})" $level
        }
    }
    catch {
        Write-Log "Network check failed: $($_.Exception.Message)" "ERROR"
    }
}

# ── RUN ALL CHECKS ────────────────────────────────────────────────────────────

Check-CPUandMemoryUsage
Check-DiskSpace
Check-ADReplication
Check-EventLogs
Check-EssentialServices
Check-DNSHealth
Check-NetworkStatus

# ── FOOTER ────────────────────────────────────────────────────────────────────

$footer = "`n" + ("=" * 60) + "`nHealth check completed at $(Get-Date -Format 'yyyy-MM-dd HH:mm:ss')`n"
Write-Host $footer -ForegroundColor DarkGray
$footer | Out-File $LogFile -Append

Write-Host "📄 Report saved to: " -NoNewline
Write-Host $LogFile -ForegroundColor Green
Code language: PowerShell (powershell)

Leave a Reply