Botnet identification and remediation
Modern botnet trends have become increasingly sophisticated both in terms of the techniques used to avoid detection on compromised endpoints, but also in their varied communication channels. The use of IRC as the communications medium of choice for Command & Control (C2) activities has been replaced with sophisticated use of IP and domain fast-fluxing to avoid detection and increase resilience. These techniques largely bypass traditional network security detection and mitigation approaches such as blacklists and intrusion detection systems.
Gone are the days of owning a system, defacing it and getting your name up on zone-H. Today it’s all about monetizing these systems. In particular in South Africa the increase in connectivity over the past few years has exposed a large amount of systems to the internet that are unpatched and unprotected. A botmasters wet dream.
Traditionally IRC used as C&C
New malware uses fast-flux to avoid detection and increase resilience (DNS Tunneling DNS C&C HTTP, P2P)
Traditional approaches are resource intensive and difficult to automate. Closing the loop and applying remediation aren’t simple tasks.
Digital zombies… they keep coming back
- Isn’t this somebody elses problem?
- Its $vendors fault
- My AV is almighty
- It won’t happen to me
if you don’t care about the small things…
The DNS approach
- Hard coding IPs is a sure-fire way to lose your botnet.
- DNS is integrated into the TCP/IP protocol stack
- Commonly being used as a means of pointing to a C&C server
- Provides a fixed to volatile mapping
- Essential but commonly ignored
- Completely passive collection of DNS query data can be done on a network using taps or DNS servers themselves
Passive Lexical Analysis
- Actual domain names can be analysed and scored
- There is a difference between human created domain names and those created using an algorithm
- Malicious domains can be identified using an analysis of characters used in the domain itself.
- Does this look right? Can never be 100% certain
- Possible to calculate letter frequencies as they occur in DNS (research focused on English language, but not restricted)
- Data such as TTLs, A records, ASNs
- Easy to see differences from normal domains
- Content distribution networks look a lot like fast-flux DNS!
- Whois information looking at registrar info, registration date, country of registration
- A 10-year-old domain is probably not a fast-flux (unless they got owned)
Take all of this, do a whole lot of maths and you can calculate a dodgyness factor!
Run this through visualisation and you can see the differences between legitimate domains and dodgy ones.
Cannot say 100% if a domain is dodgy… some domains just look dodgy 😉
URL Based Heuristics
The second component applies a lightweight mathematical classification to observe URLs contained in network traffic
The main outcomes of malicious content:
- Fraudulent advertising
- Malware downloads
Obfuscation –> 101 different ways to obfuscate… needs to be handled
- Spam haus
- Day old bread
- Automated classifications
- Host based
- Full Featured
Initial Perceptron Results
- 99% percent accuracy
- After training with good + bad data
- Less than 10 seconds to do 4400 URLs
- Fast enough as a browser plugin
- Not fast enough for a telco… yet
99% is almost too good…. maybe we got very lucky!
- You need to know whats out there!
- If you can find a host sending weird traffic to you and it’s got port 25 open, it’s more often than not dodgy!
- Port 25 is normal on a mail-server, not on a consumer DSL!
Creates a lot of data, but cutting that down to the useful data is hard.
Remediation is hard.
- Identification of C&C nodes
- Cleaning things up
- Cutting heads of the Hydra only to watch them regrow
- Relies on CERTS
How to we take all this and make it work?
- Call to arms
- Multiple deserve players required
- Internet Industry
- Public/Private Partnership
- We have seen the result of inaction
- Reputation and Trust!
Start at home… clean-up what you see every day!