ModulesSystem Health & Monitoring
Proactive Monitoring Philosophy
The best PI System administrators catch problems before they impact operations. A proactive monitoring strategy has three layers:
Layer 1: Real-time Alerts (< 5 min response)
- Interface disconnections
- Buffer overflow
- Archive fill > 90%
- Service failures
Layer 2: Trend Monitoring (hourly review)
- Event rates declining
- Latency increasing
- Disk space trending up
- CPU/memory creep
Layer 3: Scheduled Audits (daily/weekly)
- Data quality review
- Security audit
- Archive management
- Backup validationPI System Health Checks
Interface Health Metrics
Event Rate (events/sec):
Normal: matches expected scan rate
Warning: < 50% of expected rate
Critical: 0 events/sec (interface disconnected)
Buffer Queue Depth:
Normal: < 1,000 events
Warning: > 10,000 events (backlog building)
Critical: > 100,000 events (data at risk)Archive Health
Archive Fill Monitoring:
Warning: 80% fill
Critical: 95% fill (create new archive immediately)
PI System Tags to Monitor:
System.Archive.CurrentArchive.PercentFull
System.Archive.CurrentArchive.Name
System.Snapshot.QueueCount
System.Snapshot.EventsPerSecPerformance Monitoring with PerfMon
Key Windows Performance Counters
Disk Performance:
PhysicalDisk - Avg. Disk Queue Length < 2 (alert if > 5)
PhysicalDisk - % Disk Time < 80%
Memory:
Memory - Available MBytes > 2 GB (alert if < 1 GB)
Memory - Pages/sec < 100 (alert if > 1000)
CPU:
Processor - % Processor Time < 80% (alert if > 90%)
System - Processor Queue Length < 4PI SMT Diagnostic Tools
PI Log Files
C:\PI\log\pipc.log - Main PI Server log
C:\PI\log\piarchss.log - Archive subsystem
C:\PI\log\pisnapss.log - Snapshot subsystem
C:\PI\log\pinetmgr.log - Network manager
Common Error Patterns:
"Connection refused" → Interface node network issue
"Archive full" → Create new archive immediately
"License exceeded" → Point count > licensed limit
"Authentication failed" → Kerberos/mapping issue
"Timeout" → Network latency or server overloadpiartool Commands
piartool -al List all archives
piartool -collstatus Check collective status
piartool -verify Verify archive integrity
piartool -stats Show server statistics
piartool -connections List active connectionsRoot Cause Analysis: Data Gap Investigation
Step 1: Identify scope
- Which tags? Single tag or multiple?
- Which time range?
- Is gap ongoing or historical?
Step 2: Check interface status
- PI SMT → Interfaces → Check event rate
- Review pipc.log for errors during gap period
Step 3: Check buffer
- PI SMT → Buffering → Queue depth
- If buffer is full: data may be lost
Step 4: Check PI point configuration
- Is the tag active (Scan = On)?
- Is the interface scanning this tag?
- Check ExDesc for correct source address
Step 5: Verify data quality
- Check system digital states (Shutdown, No Data)
- Review compression settingsCommon Failure Scenarios
| Scenario | Symptoms | Root Cause | Resolution |
|---|---|---|---|
| Interface disconnection | Data gaps on all interface tags | Network failure, DCS restart | Restart interface, check network |
| Archive full | System digital state Shutdown | Archive not pre-created | Create new archive immediately |
| Buffer overflow | Large data gaps, high queue | PI Server overloaded | Increase buffer size, tune server |
| Kerberos failure | Authentication errors | SPN missing or expired | Re-register SPN, check AD |
| AF analysis failure | Calculated attributes not updating | Expression error, circular ref | Check analysis log in PSE |
| PI Vision slow | Dashboard load > 10s | Too many symbols | Reduce symbols, optimize queries |