Category: SANS FOR572

SOF-ELK®’s Evolution: A Comprehensive Update for Enhanced Digital Forensics

Lewes Technology Consulting recently released a major update to the Security Operations and Forensics ELK (SOF-ELK®) platform, adding many significant new features and massive under-the-hood updates.

SOF-ELK is a completely free and open source bootable virtual machine that has been pre-configured with a fully functional and customized implementation of Elastic Stack.  SOF-ELK specifically focuses on the workflows and needs of computer forensic and security operations professionals, with dozens of parsers to extract useful and relevant fields from numerous log formats.  Dashboards are also provided, with visualizations and investigative views to examine the data that has been loaded.  The project aims to eliminate the significant system administration workload required to start using the powerful Elastic Stack, a preeminent big data analytics platform, in a production capacity.  It is designed for both seasoned experts and newcomers to the DFIR field to perform mass-scale analysis with disparate data sources. SOF-ELK is built to ingest both at-rest data from evidence files as well as from live data sources.  This dual nature ingest makes it suitable for both forensic investigation and security operations use cases.

Elastic Common Schema

The most recent release includes numerous updates to the Elastic Stack components, majorly overhauled log parsers, and many additional improvements. The most significant change is the adoption of the Elastic Common Schema (ECS) for nearly 1,100 fields parsed across all data types. The ECS is a consistent approach to field naming, enabling easier analysis while also allowing correlation across multiple tools and visualizations. Unfortunately, each tool in an investigator’s toolbox might use a unique field naming scheme, making searches across different platforms complex, frustrating, and potentially inaccurate.

Consider a simple example: a field containing the source IP address for a log entry or a NetFlow record. This field name could be reflected as source_ip, src_ip, srcip, ip.source, source.ip, or countless other names in various tools.

However, when data is normalized to the ECS, analysts and investigators can create search filters that work equally across multiple tools while also flattening the learning curve they typically encounter when using new or unfamiliar tools.

Using the source IP address as an example again, the ECS specifies this value is always reflected in the source.ip field. With the ECS-based naming structure, a filter of source.ip:192.168.6.75 can be used across ECS-compliant tools, with consistent results.

This consistency also provides a path to future capabilities such as using community-built dashboards, Elastic’s security information and event management (SIEM) tool, machine learning capabilities, and more.

Another key feature of the ECS model, also implemented in SOF-ELK’s parsers, is the aggregation of values from similar fields to a single field for convenient searching. For example, a record with both a source.ip and destination.ip will have those values copied into a list of IP addresses named related.ip, which will also contain any other IP address values from the source data. This aggregated field allows an investigator the opportunity to more broadly search for records containing an IP address of interest, regardless of the specific type of IP address. (For example: related.ip:192.168.6.75. This aggregation of related fields is accomplished using numerous field types in addition to IP addresses, such as MAC addresses, network ports, hostnames, hash values for files, and many others.)

Data Enrichments

SOF-ELK also implements several data source enrichments that provide additional insight not available in the original.

Geolocation and Network Provider

One key enrichment is the addition of both location and network provider information to IP addresses. This lookup is performed on the platform itself so no data is transmitted outside of the VM, ensuring sound operational security practices that are important to ongoing investigations. The location enrichment feature allows for visualizing the location context of network artifacts on a map, while the network provider enables searching for traffic involving a specified ISP or SaaS platform.

Community ID Network Conversation Hash

The automatic calculation of a Community ID value is another new enrichment. The Community ID is a hashed identifier that specifies a specific network conversation. This public algorithm, created by the Corelight company, is based on source and destination IP addresses and ports and transport protocol. The resulting string, 1:OS79QgipeMxLNHu2rB35Gx+682k=, can be used to search for the same network conversation across multiple investigatory tools.

While originally developed by Corelight for the Zeek Network Security Monitor (NSM) platform, the public nature of the algorithm means that the Community ID has been integrated in countless network analysis tools. This is an invaluable way to identify network conversations, but it is rarely available from the original network evidence itself. Therefore, SOF-ELK will calculate and store the Community ID records from any data source if the original includes the necessary source fields.

Dashboards for Visualization

One lesson every investigator or analyst has learned is that reviewing massive amounts of data from several millions of records can be a challenging task. Most tools are simply not built to accommodate that scale of source data, yet this has become a common requirement even for smaller cases. SOF-ELK aims to alleviate that problem with dashboards and visualizations that make quick work of spotting anomalies or trends, correlating disparate data points, and simplifying even the most complex of data sources into visually digestible components.

For example, the NetFlow dashboard shown below reflects the sample source data provided in the VM. Spotting the spike of traffic in the left-most time series graph is visually simple. However, finding that pattern in over 300,000 records of text would be quite difficult. Similarly, the two nested donut charts depict ports and protocols observed in the source data. Identifying the most heavily used ports and their ratio of occurrence is much easier to accomplish visually than from those same source records.

It’s also important to note that these dashboards are all interactive and designed to support the iterative nature of an investigation. An analyst can simply click on a particular slice from a donut chart, draw a box on the map covering a particular focus area, select a time frame of interest, and immediately narrow an extremely large set of source records to a small subset of interest based on the search characteristic at the time. This makes the dashboards themselves tools in addition to visualization and reporting tools.

Extensive Parsing Capabilities

SOF-ELK already includes parsing capabilities for dozens of data types, with more being added all the time. Currently, the data types include:

  • Syslog-formatted log entries from *NIX systems, covering numerous subtypes such as SSH, DHCP, DNS, firewalls, and more
  • HTTP server logs in several formats including Common Log Format, Combined/Extended, IIS CSV formatted, web proxy logs, proxy server logs, and more
  • Zeek NSM logs, in JSON form
  • KAPE (suite of endpoint forensic software) logs, in JSON form
  • Amazon Web Services (AWS) logs
  • Google Cloud Platform (GCP) logs
  • Microsoft Azure and Microsoft 365 logs
  • Kubernetes logs
  • NetFlow network traffic summaries covering NetFlow versions 5, 7, and 9, IPFIX, and equivalent files from Zeek, GCP, AWS, and Azure

SOF-ELK can process each of these data types from static files loaded to the platform. In most cases, it can also process source data from live sources transmitted over a network connection. This enables both a post-incident investigative workflow for DFIR purposes as well as security operations workflows to support ongoing collection and observation.

Free and Open Source with Dynamic Updates

All configuration files used on the SOF-ELK platform are maintained in a GitHub repository.  This permits public review of all the project’s content  and users to report and discuss any bugs, optimizations, or feature requests they identify.  The GitHub repository also provides a means of updating platforms operating in the field without needing to redownload another VM.  This update feature requires that the VM have Internet access, but only requires a single command to download and activate updated parsers, dashboards, or visualizations.  Any newly added data sources are also accommodated using the field update process.  Generally, a new VM download is only required for significant updates to the base operating system, Elastic Stack, or similar major components.

The SOF-ELK platform is a completely free community resource anyone can use for casework, research, or any other purpose. It is also used in several SANS courses. This allows students of all skill levels to gain experience in realistic hands-on scenarios using sample case data collected from controlled environments designed to model real-world enterprises. The SANS course, FOR572: Advanced Network Forensic and Analysis uses SOF-ELK to correlate log data from various sources and examine large volumes of NetFlow records. The FOR509: Enterprise Cloud Forensics and Incident Response course uses SOF-ELK to examine cloud data evidence across all cloud service providers and FOR589: Cybercrime Intelligence incorporates SOF-ELK for large scale data analysis. Other course authors are in the process of integrating SOF-ELK into more SANS courses. This will provide future students and practitioners a consistent user experience across a growing range of forensic evidence types.

Here are several online resources to help get you started with SOF-ELK:

If you’re looking for a turnkey tool that immediately adds value to massive volumes of common forensic evidence data types, consider giving SOF-ELK a try.

Discover the power of SOF-ELK! Enhance your forensic skills and master network and cloud evidence analysis with the SANS FOR572: Advanced Network Forensic and AnalysisFOR509: Enterprise Cloud Forensics and Incident Response, and FOR589: Cybercrime Intelligence courses. Ready to take the next step? Register today or request a demo to see these courses in action!

UPDATED: 2015 DFIR Monterey Network Forensic Challenge Results

2015-02-22 UPDATE: I’ve added some thought process/methodology to the answers inline below.


 

Thanks to everyone that submitted or just played along with the SANS DFIR Network Forensic Challenge!  We had over 3,000 evidence downloads, and more than 500 submissions!  Per the rules, the winner must have answered four of the six questions correctly.  Then, by random selection among those submissions, the winner was selected.

We’re excited to announce that Henry van Jaarsveld is the winner for this challenge!  Congratulations, and we hope you enjoy your SANS OnDemand Course.  Great work, Henry!

Thanks for all the submissions and interest in this challenge.  If you enjoyed the questions – no matter how many questions you answered – you should check out FOR572: Advanced Network Forensics and Analysis. The class is available via OnDemand, as well as the following live and virtual SANS events:

More live and virtual/remote events are being added all the time, so keep checking the course page for additional offerings.

The challenge answers are listed below:

  1. At what time (UTC, including year) did the portscanning activity from IP address 123.150.207.231 start?

Answer: Aug 29 2013 13:58:55 UTC

Portscanning activity is typically characterized by connection attempts to a range of ports. This is often repaid and originates from the same IP address. Some scanning utilities may or may not use the same source port or a small cluster of source ports. In this case, the following command get you started:

$ grep SRC=123.150.207.231 messages

The first result is below:

Aug 29 09:58:55 gw kernel: FW reject_input: IN=eth0 OUT= MAC=08:00:27:53:38:ee:08:00:27:1c:21:2b:08:00 SRC=123.150.207.231 DST=98.252.16.36 LEN=44 TOS=0x00 PREC=0x00 TTL=41 ID=35517 PROTO=TCP SPT=38553 DPT=3306 WINDOW=1024 RES=0x00 SYN URGP=0

However, we’ve asked for the time in UTC, which is the only recommended time zone to use for forensic reporting. To find the offset, examine the same “messages” file further. This isn’t often an explicitly logged value, so context is necessary. The following line shows the syslog time (system local time) and a corresponding UTC value. Therefore, it is reasonable to state that the system’s time zone is UTC-4 during the time the file was created.

Aug 29 07:07:40 gw kernel: rtc_cmos rtc_cmos: setting system clock to 2013-08-29 11:07:08 UTC (1377774428)


  1. What IP addresses were used by the system claiming the MAC Address 00:1f:f3:5a:77:9b?

Answer: 169.254.20.167, 169.254.90.183, 192.168.1.64

This is an exercise in using Wireshark/tshark display filters. The following tshark command will answer the question quickly:

$ tshark -n -r nitroba.pcap -T fields -e 'ip.src' -Y 'eth.src == 00:1f:f3:5a:77:9b and ip' | sort | uniq

-n: suppress DNS lookups
-r nitroba.pcap: file to read
-T fields: use “fields” output format
-e ip.src: output just the “ip.src” field, as defined by the Wireshark/tshark parsers
-Y 'eth.src == 00:1f:f3:5a:77:9b and ip': display filter to limit results to the MAC address of interest and IP traffic, which would be the only traffic to include IP addresses
| sort | uniq: bash shell utilities to narrow results to only unique values


  1. What IP (source and destination) and TCP ports (source and destination) are used to transfer the “scenery-backgrounds-6.0.0-1.el6.noarch.rpm” file?

Answer: 149.20.20.135 and 192.168.75.29, 30472 and 51851

Again, the tshark utility is your friend.  This is as a multiple-stage process.  First, get the frame number containing the desired request.  This command returns frame number 5846.

$ tshark -n -r ftp-example.pcap -Y 'ftp.request.arg == "scenery-backgrounds-6.0.0-1.el6.noarch.rpm"' -T fields -e frame.number

-n: suppress DNS lookups
-r ftp-example.pcap: file to read
-Y 'ftp.request.arg == "scenery-backgrounds-6.0.0-1.el6.noarch.rpm"': display filter to limit results to just FTP commands that included the argument of interest
-T fields: use “fields” output format
-e frame.number: Get the frame number containing the desired request

Next, find the immediately preceding “Passive Mode” response.

$ tshark -n -r ftp-example.pcap -Y 'ftp.response.code == 227 && frame.number < 5846' -T fields -e frame.number -e ftp.passive.ip -e ftp.passive.port | tail -n 1

-n: suppress DNS lookups
-r ftp-example.pcap: file to read
-Y 'ftp.response.code == 227 && frame.number < 5846'
: Display filter to limit results to just FTP response codes of “227” (Entering Passive Mode) and prior to the frame number containing the request of interest
-T fields: use “fields” output format
-e frame.number -e ftp.passive.ip -e ftp.passive.port: Get the values from the fields of interest
| tail -n 1: Just return the last result from the list

Finally, get IPs and ports from both ends of the data transfer.

$ tshark -n -r ftp-example.pcap -Y 'ip.addr == 149.20.20.135 && tcp.port == 30472' -T fields -e ip.src -e tcp.srcport -e ip.dst -e tcp.dstport | sort | uniq

-n: suppress DNS lookups
-r ftp-example.pcap: file to read
-Y 'ip.addr == 149.20.20.135 && tcp.port == 30472'
: Display filter to isolate TCP connection according to IP and port determined above
-T fields: use “fields” output format
-e ip.src -e tcp.srcport -e ip.dst -e tcp.dstport: Get the values from the fields of interest
| sort | uniq: Only display unique lines


  1. How many IP addresses attempted to connect to destination IP address 63.141.241.10 on the default SSH port?

Answer: 49

A connection attempt may or may not be successful, so we can simply limit our search to the high-level filtering provided by nfdump. You could use grep against the text file as well.
There are 55 total connections:

$ nfdump -q -O tstart -r nfcapd.201405230000 -o 'fmt:%sa' 'dst ip 63.141.241.10 and dst port 22' | wc -l

-q: “quiet” output, which suppresses summary header/footer information
-O tstart: order output by “start time” of each record
-r nfcapd.201405230000: input file to read
-o 'fmt:%sa': only display the source IP address for each record
'dst ip 63.141.241.10 and dst port 22': limit flows to those from the IP address of interest, to the default SSH port. You might also limit by TCP protocol by adding “and proto tcp
| wc -l: count the results

There were 49 unique IPs in this data set:

$ nfdump -q -O tstart -r nfcapd.201405230000 -o 'fmt:%sa' 'dst ip 63.141.241.10 and dst port 22' | sort | uniq | wc -l

This is the identical command to that above, but uses the following shell command chain

| sort | uniq | wc -l: Count only unique lines from the nfdump command’s output


  1. What is the byte size for the file named “Researched Sub-Atomic Particles.xlsx”?

Answer: 13,625 bytes

To find the portion(s) of the input pcap that involve the filename of interest, use the “smb.file” field to find the TCP streams of interest.

$ tshark -n -r stark-20120403-full-smb_smb2.pcap -Y 'smb.file == "Researched Sub-Atomic Particles.xlsx"' -T fields -e tcp.stream
2104
2207

This is a large input pcap, so loading it directly to Wireshark is not advisable. Instead, isolate the TCP streams identified above to a new file:

$ tshark -n -r stark-20120403-full-smb_smb2.pcap -Y 'tcp.stream == 2104 or tcp.stream == 2207' -w tcpstreams_2104_2207.pcap
$ md5sum tcpstreams_2104_2207.pcap
fe9c5a388d0d70f74bb96913f120fc7a tcpstreams_2104_2207.pcap

This file is very feasible to open in Wireshark, as it’s a mere 18MB.

After opening the file, you must explore the SMB session – which is not at all a simple process. In the input file generated above, the message we’re interested in is the Trans2 Response message containing Standard File Info for the file of interest. This occurs in frame 749 (frame.time = Apr 5, 2012 14:21:50.574112000). By spelunking the available fields, you’ll find the “End of File” value, which is 13,625. This represents the number of bytes in the file. Note that the Wireshark status bar tells us that Wireshark knows this field by the name “smb.end_of_file”, which could be used to scale this process out via the tshark utility.


  1. The traffic in this Snort IDS pcap log contains traffic that is suspected to be a malware beaconing.  Identify the substring and offset for a common substring that would support a unique Indicator Of Compromise for this activity.

Answer: ULQENP2 at offset 4 (bytes 5-11 of the TCP data segment, zero-based)

There are a number of ways to approach this. The goal is to identify commonalities among the individual sessions, even though we are not (yet) sure what the bytes mean.
This evidence file is small enough to load into Wireshark, then visually explore the content – despite Wireshark not knowing the content is anything other than generic “Data”.
After visually inspecting these fields in the traffic the IDS logged, you should see that bytes 4-10 (zero-based, of course) seem consistent. This can be confirmed with the following display filter:

data.data[4-10] == 55:4c:51:45:4e:50:32

After applying this filter, you can quickly see that 100% of the packets in the IDS log file match. Expanding the filter one byte before or after this substring range results in a <100% match. Barring any additional knowledge of the custom protocol used for these communications, this substring and offset would be a good indicator of compromise.


  1. BONUS! Identify the meaning of the bytes that precede the substring above.

Answer: UNIX Timestamp

There is a no magic solution here – just trial and error combined with experience. The UNIX timestamp (number of seconds after Jan 1, 1970 at 00:00:00 UTC) fits into four bytes. Those with a keen eye for timestamps will see that after converting any given four byte sequence to a big-endian integer, then converting that to a timestamp, the Wireshark/tshark “frame.time” field value corresponds almost perfectly in every case. For example:

0x4fe6c278 == 1340523128
$ date -u -d @1340523128
Sun Jun 24 07:32:08 UTC 2012
Corresponding frame.time: Jun 24, 2012 07:32:08.273277000