Because we're damned if we do, and we're damned if we don't!
Apache Log Extractor is a quick script I threw together to export URL information from Apache access logs. The thought behind this script was to provide a list of known URL’s on a remote server by analysing the logs. This list could then be used as the input for further testing tools (e.g Burp Suite – Intruder)
The script accepts an Apache access file as the input and creates an output file containing one URL per line. The list is unique and should only contain the URL without parameters (incomplete directory names are not extracted). It also takes these URLs and creates a wordlist output of all valid directoy names for use with brute-forcing etc… As of version0.4 I’ve also added some (messy) scripting to output basic auth usernames if they’re present in the log file.
As my test logs were limited I’ve tested it as far as I can. If you come across a log it hates, please send me a few example entries and I’ll try to fine tune the regex.
Apache access logs occassionally replace portions of the URL with … if the path is considered too long. So if you see this in the output, it originates from the access.log provided. Garbage in –> Garbage out
Usage example .:
Output Example .:
[ ] Extracting URLs from logfile : access.log.1 [ ] Extracted URL : / [ ] Extracted URL : /Signed_Update.jar [ ] Extracted URL : /ajax/bottomnavinfo.ashx [ ] Extracted URL : /MetaAdServer/MAS.aspx?cp=seite1&ct=contentview_ressort&f=0 [ ] Extracted URL : /favicon.ico [ ] Extracted URL : /EB3YKJjcJ5YvJ [ ] Extracted URL : /MetaAdServer/MAS.aspx?cp=seite1&ct=contentview_ressort&f=1 [ ] Extracted URL : /AdServer/SponsorButtonC.aspx?ids=16965 [ ] Extracted URL : /Mail [ ] Extracted URL : /css/layout.css [ ] Extracting directory names from logfile [ ] Extracted Word : ajax [ ] Extracted Word : MetaAdServer [ ] Extracted Word : AdServer [ ] Extracted Word : css [ ] Extracted Word : mail
You can find a download link for the Apache Log Extractor Python script through the links below.
Feedback is always gratefully received…
Next Steps .:
I’m currently debating next steps for the script. Some posibilities include :
- Check each URL and save the response code and body, for offline analysis
- Secondary output
- Tree structure of known resources
Wordlist from URL information (names, parameters, …)Inplemented in 0.3
If you have anyother ideas, please feel free to get in touch.
- Apache Log Extractor–> Project Download