This post is based on a presentation I gave at the last Thotcon, but was really prompted by a case from a couple days ago. It’s an interesting example of how the same disciplined methodologies for finding malicious traffic on the network also applies to sophisticated situations on the host as well. We’ll examine those methodologies and logic on the host by examining a little app I wrote called LockPick, pictured here and detailed later in this article. As we’ll see, mutex analysis is a VERY powerful way of analyzing systems during Incident Response. They can lead the direction of your analysis when other automated methods fail to do so.
On the Network
On the network, we saw the following traffic from a client:
There are an enormous number of “tells” in this exchange indicating this is malware-related traffic, especially when combined with each other. A small list of anomalies that immediately directed our attention to this session includes:
Request containing dynamic variables parsed by a default unnamed file in the root directory of the webserver. Request with an obviously crafted User-Agent string. The number of tags contained in the user’s HTTP request is incorrect for a real Wget client. The server is nginx. The server is located in a suspicious hosting environment and country (not shown above). The server reports a content type of “text/html,” but is sending high-entropy binary data (seen in hex view, also not shown above).
The request above is followed by others that look similar to the traffic below:
Again, there are an enormous number of “tells” in this exchange indicating this is malware-related traffic. A small list includes:
Request containing dynamic variables parsed by a default unnamed file in the root directory of the webserver. Request reports itself as being HTTP/1.0, a 15 year old spec. The user-agent string is unique out of millions of other network sessions. The other tags in the client’s HTP request header do not match the number of tags from Internet Explorer 7, as this request reports itself as. The request contains tags not valid as part of HTTP 1.0. The server is nginx. The server is located in a suspicious hosting environment and country (not shown above).
We could keep going by identifying other anomalies and the literally dozens of different ways traffic like this stands out on the network like a sore thumb (without using signatures), but that’s really covered in other articles I’ve written. At this point we can make a leap of faith and say the system is compromised and should be examined. But here’s where things get interesting….
On the Host
Of course malware hiding itself well on systems is nothing new. Regardless, in many scenarios, starting with the standard sysinternals tools to get an idea of what’s happening on the system can be helpful. In the best case, the malware will stand out like a sore thumb using these tools, and you’ll be able to quickly determine the best course of action without wasting cycles going down deeper rabbit holes. In this case, there are a relatively small number of processes running. Almost every one of them can be verified by signatures, and all of them seem normal based on searching for information about them:
Going through the results of autoruns and the other common system internals tools yields similar results – that being, nothing obviously bad visible. That means the next steps (assuming we’re continuing with a live analysis, as we are for the purpose of this write-up) involve running tools that examine the system at a much lower level. This also means that you as an analyst will now have much more technical information to wade through, and a much higher volume of it the deeper into the system you’re forced to go. That also means your day is going to end later than you expected. 😉 Because of this, we like to use automated tools as much as possible. Shortly before writing this post, Mandiant released a tool called Red Line which makes auditing the data produced by Memoryze a far more efficient process. It does this two ways – first by scoring the “badness” of processes for you, and secondly, by organizing the large volume of data harvested in ways that are much easier to visually dissect and inspect. Here’s a link to the tool and more information: https://forums.mandiant.com/topic/redline-faq In the case of this particular system, it found three processes that ranked as suspicious:
Unfortunately, two of the three hits show why whitelisting/blacklisting is such a difficult challenge to manage when it comes to “anomaly” detection on hosts. Two of the hits are related to printer services being loaded:
The third hit was given for the following reason:
This is also a false positive related to the particular Hot Key manager OEMed by this laptop company. At this point we can start a more time consuming manual investigation by digging into the system, or… We can first try mutex analysis.
Mutexes – The Canary in the Coal Mine
From MSDN: When two or more threads need to access a shared resource at the same time, the system needs a synchronization mechanism to ensure that only one thread at a time uses the resource. Mutex is a synchronization primitive that grants exclusive access to the shared resource to only one thread. If a thread acquires a mutex, the second thread that wants to acquire that mutex is suspended until the first thread releases the mutex. Mutexes are of two types: local mutexes, which are unnamed, and named system mutexes. A local mutex exists only within your process. Named system mutexes are visible throughout the operating system, and can be used to synchronize the activities of processes. For a simple example of how mutexes are used, assume you’re using a web browser that maintains a history log of sites visited. Now assume you have multiple browser windows open. Each of those browser processes will try to update the history file, but only one can lock and update the file at a time. By registering the file with a mutex object, the different processes know when to wait to access the file until other processes have finished updating it. Malware does this sometimes too, but it’s far more common to malware to use mutexes to ensure only a single copy of itself can run at any given time. The great thing about this (for us as analysts) is that this applies to highly sophisticated kernel-level malware, as well as completely non-sophisticated user-land malware. Even better, while malware can go out of its way to hide every trace of itself on a system, it generally needs to leave its mutex objects fully exposed for proper functioning of the malware.
Accessing Mutexes
The way we’ll harvest mutexes on a system is by leveraging a kernel object known as the Object Manager Database. The object manager organizes object names into a hierarchy, just like the hierarchical layout of the file system. (In fact, the file system is conceptually just a subtree of the overall object namespace tree). If you’d like to graphically browse the Object Management namespace, check out the application WinObj available from: http://technet.microsoft.com/en-us/sysinternals/bb896657. An example screenshot is here:
Programmatically, the way we access the Object Manager is through ntdll.dll. The four main exports giving the access and testing capabilities needed are: An example of using these functions programmatically is included here: Embed in the online post using: <script src=”http://pastebin.com/embed_js.php?i=zhmJTffK”> Or: <iframe src=”http://pastebin.com/embed_iframe.php?i=zhmJTffK” style=”height:100px;border:none;width:100%”> Or: http://pastebin.com/embed_iframe.php?i=zhmJTffK The “directory” object we’re interested in this case is BaseNamedObjects. Under there we can find a listing of mutexes, events, semaphores, waitable timers, and section objects.
Analyzing Mutexes
Ok, so now we can harvest them, what do we do with them? Sure, we can do simple signature matching to find known bad ones, but that is a horribly ineffective method of mutex analysis since they’re highly configurable in most major malware families, as you can see in the example below. There is a HUGE amount of variability between bad ones.
While hackers can make a piece of malware’s mutex anything at all, there is one major constraint they have. That is, they can’t use a normal legitimate mutex otherwise there will be a collision of mutex names and the malware won’t load. There is a subtle workaround to this they can take advantage of, but we can test for that condition too – as will be discussed in a moment. Speaking of normal and legitimate mutexes, what do they normal look like anyways?! Well, here’s a short sample list of normal mutexes: So if those are some example legitimate ones, how about some known bad mutexes? Here’s a short list of those too: From those two lists alone we can see a number of generalized differences between “typical normal” and “typical bad” mutexes. Some of the more obvious differences include:
Length differences Differences in the “entropy” of the strings themselves Differences in the formatting of the strings Differences in the usage of special characters and how those characters are distributed throughout the string
And that is the basis for the application shown at the start of this article: LockPick. (Why the name LockPick? Pretty simple really… Mutexes are used to lock objects. This application tries to pick the bad ones. 🙂 It’s also a shout out to Deviant Ollam. Hey man!)
Back to Our Compromised System
So here are the results of running LockPick on the system we started analyzing at the start of this article:
By default, LockPick filters mutexes it thinks are “good,” so all we see above are a handful of “unknown” mutexes (those that generated no internal score of suspiciousness), and one labeled “suspicious” since it scored enough to make it onto the bottom of the “bad” scale. Toggling the option to allow showing the entries it thinks are good just looks like the following:
Searching Google for references to the suspicious mutex returns the following – that is, nothing:
Now we’re getting somewhere! So we examined the system with a handful of tools and didn’t find anything obvious yet. At this point, we found a mutex that appeared to be suspicious, with no references to it on Google, which is even more suspicious! The next question is, “What process(es) are using this mutex?” There are a number of ways we can answer this question and Process Explorer (http://technet.microsoft.com/en-us/sysinternals/bb896653) is typically a very easy way to go about it. As you can see, searching for the mutex (handle) returns all processes holding it:
In this case, we see only a single process is holding that mutex. This seems like an appropriate point to insert a side note (soapbox) about using “handle count” during analysis. I know some people are big fans of using handle count to find bad things on a system. The idea is that if a mutex (or almost any other object) is legitimate, then there will likely be a number of threads or processes pointing to it. Generally, people say if there are 3 or more references to a handle or module, then the chance of it being trustworthy is higher. I’m not a fan of this idea – at all. Not only are those counts highly volatile, but more importantly, I just looked at the laptop I’m typing this on and there are over 500 mutexes system-wide and over 100 of them have only single pointer. That’s way too many to tag as suspicious. If we lump in those with only two pointers, the count goes up much more. We see similar results on this system when looking at dll’s, but this idea is much worse for mutexes anyways since that count is thread-based. Some malware will have several threads pointing at a mutex that’s obviously bad, but logic that trusts a mutex based on pointer counts (or even contributes to the “positive” score) will be thrown off completely by this fact. I have other thoughts here, but… Not the point of this article…. Anyways… Double-clicking the process you’re interested in the search results windows automatically brings up the handle view of the bottom pane so you can examine all handles held by this process.
At this point we can be quite sure something bad has been injected into the kernel-level process services.exe. The next question is, “what is it?!” At this point we’ll force a full memory crash dump for services.exe and do an analysis of it. This can also be done using Process Explorer by right-clicking the process you’re interested in, as shown below:
Analyzing the Dump
We’ll take this dump file offline and examine it on another computer at this point. Generally we use WinDbg (http://msdn.microsoft.com/en-us/windows/hardware/gg463009) for this task, but this article is long enough as it is, so in this case we’ll use another great tool: PEBrowse Crash-Dump Analyzer available at http://www.smidgeonsoft.com/download/PEBrowseDmp.zip and just use WinDbg for only specific tasks that we can’t use PEBrowse Crash-Dump Analyzer for. Loading the dump and displaying the summary information shows the following in PEBrowse Crash-Dump Analyzer:
And here’s where things get really interesting in a very unexpected way. I need to admit at this point I’m doing this analysis for the first time – at the same time I’m writing this article. I was getting ready to explain how and why examining the heaps of a process can provide useful clues about how to proceed with your analysis – which will likely include manually carving injected dll’s from this dump, but…. Here’s what we see in the first heap area I just happened to dump for an example:
I think we can go out on a limb and say we’re looking at a malware configuration file with ini-style sections with names like: Well, great! Now we can just Google for some of these strings and find out what malware family this belongs to, find references to analysis of this malware and its configuration files, and then we know exactly what to do about carving and analyzing this memory dump to validate our findings and determine the extent of the threat to our environment. Unfortunately, this is what we get:
Searching for other strings returns similar results. Nothing! (Even taking strings I started pulling from other heap structures and pages containing the injected malware.) In other words, without intending to, we may have just found a new family of malware. And by “new,” that means a family of malware that may have been in the wild for quite some time, but hasn’t been publicly analyzed yet. Because this article is already 16 pages long, we’ll pause here and cover a detailed analysis of the crash dump in a “Part Two” to this article. Considering this discovery, this article just got MUCH longer! Validating this really is something new will be a much more involved process, but we’ll see what happens…
Summary
The entire point that served as the motivation behind all of this…. Mutex analysis is a VERY powerful way of analyzing systems during Incident Response, especially if you’re creative and intelligent about it. They can lead the direction of your analysis when other automated methods fail to do so, as this article has very clearly shown. LockPick is very alpha at this point and needs a lot of updating (my todo feature list still has some significant items and this is only lightly tested at this point), but is still useful as-is, as you have seen. Please feel free to email me at gary.golomb@netwitness.com about it in the meantime. You can read part two of this series here. Posted with permission from the NetWitness Blog.