Introduction
We recently faced a series of unexplained crashes across multiple indexers. This blog post details the systematic analysis conducted to identify the underlying issue, including the discovery process, crash patterns, and recommended remediation steps.
Crash Discovery
The issue first came to attention through scheduled job failures observed in Splunk. A targeted investigation began with reviewing crash logs, utilizing this initial search:
index="_internal" sourcetype=splunkd_crash_log

This revealed multiple crashes sharing similar characteristics across different searches and datasets. Expanding three of the crash events yielded commonalities.
Crash Patterns
All observed crashes consistently presented with the following characteristics:
-
Assertion Failure : Occurred in
ChunkedCSVLineReader::rewind()at line 894 in/builds/splcore/main/src/searchthingmgr/IndexedCSV.cpp -
Signal : SIGABRT (signal 6), triggered by an assertion failure
-
Affected Thread :
BucketSummaryActorThread
Common Call Stack
The crashes consistently propagated through CSV lookup processing:
-
ChunkedCSVLineReader::rewind() -
IndexedCsvDataProvider::lookupBatch() -
LookupDataProvider::lookup() -
CachedProvider::lookup() -
LookupDriver::flush() -
AutoLookupDriver::execute() -
LookupProcessor::execute() -
SearchProcessor::execute_dispatch() -
SearchPipeline::execute() -
BucketColumnStore::execute_pipeline() -
BucketSummaryActorThread::main()
Crash Event Details
Events 1 & 2: DLP Datamodel
-
Datamodel :
DM_Splunk_SA_CIM_DLP -
Index :
casb-netskope -
Search IDs :
-
RMD5227ace381dbe30b6_at_1751970120_4290 -
RMD5227ace381dbe30b6_at_1751969820_4107
-
-
Tags :
cloud,pci -
Events processed : 4,663 and 4,750
Event 3: Change Datamodel
-
Datamodel :
DM_Splunk_SA_CIM_Change -
Index :
cloud-aws-cloudtrail -
Search ID :
RMD5ea35b39b15ad40d_at_1751969701_4031 -
Tags :
account,audit,cloud,delete,endpoint,network,pci -
Events processed : 231
Understanding Search IDs
The crashes were associated with system-generated search IDs, structured as follows (scrubbed for privacy):
remote_sh-i-[instance-id].[environment].com_scheduler__nobody_[base64-encoded-string]_RMD[unique-id]_UnixTimestamp_sequenceNumber
Example breakdown:
-
remote_sh-: Remote searchead indicator -
i-09XXXXXXXXXXXXXX: AWS instance identifier -
example.splunkcloud.com: Splunk Cloud environment -
scheduler__nobody: Scheduled execution by system user -
U3BsdW5rX1NBX0NJTQ__: Base64 encoding of “Splunk_SA_CIM” -
RMD5227ace381dbe30b8: Unique identifier for the search -
1751970120: Unix timestamp (July 8, 2025) -
4296: Sequence number
Note: Actual search names are not directly embedded within these IDs.
Root Cause Analysis
Detailed examination identified the crash occurring specifically during CSV lookup processing in the ChunkedCSVLineReader::rewind() function. Potential contributing factors include:
-
CSV lookup file corruption or formatting issues
-
An unexpected internal state causing the assertion failure during rewind operations
The consistent call stack across various contexts confirmed that this was a systematic platform issue rather than isolated data corruption.
Identifying Affected Searches
Administrators can correlate search IDs with actual searches by:
-
Using Splunk Web UI: Settings → Job History
-
REST API queries against search job details
-
Reviewing scheduler logs around the crash timestamps
-
Inspecting
savedsearches.conffiles for scheduled searches
Implications and Recommendations
Immediate Actions
-
Validate integrity and format of CSV lookup files (
$SPLUNK_HOME/etc/apps/*/lookups/) -
Audit CSV lookup configurations for scheduled searches
-
Monitor scheduled jobs that utilize CSV lookups
-
Consider a Splunk version upgrade if a known resolution is documented
Long-Term Recommendations
- Implement proactive monitoring and alerting for crash events
Conclusion
This comprehensive analysis confirmed a systematic Splunk platform bug affecting CSV lookup processing. Immediate corrective actions and structured long-term preventive strategies are essential to mitigate impacts. Administrators should report this to Splunk support for prompt resolution.