Web Forensics

Web Forensics

The victims of Web attacks are clients and Web servers. Both clients and server side protection is necessary. The attacks can be performed by using false URLs andredirects to malicious sites. The medium of attack on the Internet are Web Browser,database servers and application servers. On the client side, forensic analysis is done tofind out if a user has been involved or has been a victim of the crime. Potential evidence can be found in the Browser history, registry entries, temporary files, index.dat, cookies,favourites, html pages in unallocated space, emails sent and received by the user and thecache etc. On the server side, forensic analysis can be done by examining access logs,error logs and FTP log files and network traffic. The intermediate site logs such asantivirus server logs, Web filter logs, spam filter logs and firewall logs also help intracking an incident.

There are five basic steps to computer forensics [Ashcroft 2001]:

  • 1. Preparation (of the investigator, not the data):The investigator should be aware of the problem fully. He/she should havea proper (could be abstract) plan for investigation. Acquire permissions toaccess the information that investigation process may need.
  • 2. Collection (of the data):Collect the data required for the investigation. Proper precautions need to be taken while collecting the information. Safety devices like write blockers should be used. All the data should be collected according to aplan of investigation.
  • 3. Examination:A careful examination of data should be done. Sophisticated tools should be used to make sure the tests give accurate results. All the possible scenarios should be taken into consideration while an investigation isbeing carried out.
  • 4. Analysis:Analysis of results to reach a conclusion should be transparent. Analysing could not lead to the actual facts. If possible an interview should beconducted to backup the results.
  • 5. Reporting:Reports should be reported to the concerned authority with utmost security and integrity. The reports should be archived and saved for future references.
  • Types of Browsers
  • Web Browsers can be classified based on the layout engines or the technology used in the development. Some of the kinds of Browsers are [Browsers 2010]:
  • KHTML and WebKit based Browsers
  • Gecko-based Browsers
  • Trident based Browsers
  • Specialty Browsers
  • Text based Browsers
  • Other Browsers
  • A brief overview of these Browsers follows [Browsers 2010]:
  • KHTML and WebKit based Browsers Developed by the KDE project (a free software community which produces cross platform applications for Linux, Windows,Solaris, FreeBSD, Mac OS X), KHTML is an HTML layout engine written in which C++ supports good Web standards. Browsers like ABrowse, Safari, Konqueror were developed based on this layout engine. Safari is an Apple Web Browser that comes preinstalled with the Macintosh operating system. WebKit is another layout which was initially developed by Apple Inc. and later Google Inc. also developed this layout to develop Google Chrome Browser.
  • Gecko-based Browsers Mozilla Corporation developed this layout engine and many more like this, for example Mozilla Application Suite, Nvu, Mozilla Thunderbirdetc. It is written in C++ and is cross-platform and offers a rich programming API that makes it suitable for a variety of applications to run in the Browser.
  • Trident based Browsers These Browsers use Microsoft's Trident engine. Some of the Browsers using this layout engine are AOL Explorer, Enigma, and Yahoo! Browser etc. The Add/Remove Programs tool in Windows is an HTML application basedon Trident to display the list of programs installed on the computer. Appropriate adjustments were made to use Trident for the layout of the newer versions of IE.
  • Specialty Browsers These Browsers are designed for special purposes to give specific intended content. Ghostzilla is used to hide the Internet use and HeatSeek is the Browser to seek online porn. Some of the other specialties Browsers include Flock, Wyzoand Songbird.
  • Text based Browser These Browsers only use text. Some of the Browsers like Alynx, Elink, w3m, and Lynx are developed using this approach. Other Browsers like Emacs, IBrowse and PlanetWeb are also available on the Internet. The focus of this project is on the Internet Explorer, a Trident based Browser, and Mozilla Firefox, a Gecko based Browser, Google Chrome, a WebKit based browser and Apple, KHTML based browser.
  • Previous Work on Web Forensics

  • Specialty Browsers
  • It is well known fact that it is difficult to find criminals online based on previous experiences. The lessons learned from previous criminal cases involving computers canbe used to assist with current investigations. The difference in skill sets within computer forensics and Web forensics not only refer to the characteristics of the investigator, butalso to the criminal as well [Berghel 2003]. Berghel emphasizes that in most casescomputer forensics criminals are not that computer savvy and have only the basis fundamental skill level of a typical end user. There is an advanced skill set required for Web and Internet related crimes and the author mentions that criminals tend to rely on thesame tools that they are most likely being investigated with. Berghel references the application NetScanTools Pro as a forensic tool which is used for forensic analysis withina legal fashion yet can also be manipulated by hackers for the purpose of misuse andillegal activity. The difference in use of the applications within the Web forensicsenvironment is one of ethics and not skill.

    Martellaro discusses the role of the ping command in a forensics investigation, which is used in finding out the IP addresses of a known domain name [Martellaro 2009].While the icmp (Internet Control Message Protocol) may be disabled on some networks,the IP address of a site is returned right away. On a Windows computer launching the command prompt application and typing ping gets the IP address. For aMac the Network Utility found in the Application/Utilities folder is invoked and there aremultiple information gathering tools like ping, lookup, trace route and whois. Theinteresting part is that IP addresses are generally assigned by country, and there aredatabases of IP addresses versus geolocation. That's how an iPhone uses location services [Mar

    IE is a browser that is pre-installed on a Windows computer, stores all Internet browsing activity under a user's Windows profile. According to Jones, the directoryclosely matching the path "C:\Documents and Settings\\Local Settings\Temporary Internet Files\Content.IE5\" in Windows XP, this directory stores the cached pages and images that a user may have accessed through their Web Browser [Jones 2010]. The purpose of this cache is for the Browsers not re-download information that has beenpreviously viewed. Other two directories of interest as Jones et al [Jones 2010] point outare: "C:\Documents and Settings\\Local Settings\History\History.IE5\" and"C:\Documents and Settings\\Cookies\" [Jones 2010]. The locally cached Webpage and the URL has to be correctly located to rebuild a Web page a user had visited and the corresponding URL the user visited. The authors assert that the differencebetween the index.dat and history.dat file is that a history.dat file is saved in an ASCIIformat rather than binary [Jones 2010].

    The other difference being that the history.dat file does not link Website activity with cached Web pages. This is a drawback for the investigator because views of thevisited Web pages cannot be readily assembled.To overcome this limitation with investigating Internet activity as used in Firefox Browser, they proposed using Cache view, which is a shareware tool that provides access to several Browsers. "For each cached page, Cache View provides the URL from whichthe page was retrieved, the name of the cached file as stored on the local system, the filesize, file type, the time it was last modified, the download date and its expiry (ifapplicable)" [Jones 2010]. File type definitely includes HTML Web pages for whichCache View digs down to some interesting information about email addresses as stored.Web-based e-mail content and persistent Browser cookies are important things inthe analysis and reconstruction of the subject's Internet activity.

    There are many WebBrowsing forensic tools an investigator can choose. Forensic tools FTK, SQLitebrowser,Cache Viewer, Registrar Registry manager, plist Editor Pro and IECacheView are thetools used in the project.According to the recently (June 2010) NIST (National Institute of Standards &Technology) released an Inter agency Report, a design and architecture of a forensic WebService (FWS) that would securely maintain transactional records between other Web8services is proposed. The secure records can be re-linked to reproduce the transactionalHistory by anindependent agency. The report then is used to show the necessarycomponents of a forensic framework for Web services and its success through a casestudy. The challenges that need to be overcome in regards to Web forensics werementioned in the report.

    As opposed to traditional forensics implementation, applyingforensics to Web service infrastructures introduces novel problems such as a need forimpartiality and comprehensiveness. The primary purpose of digital forensics is topresent digital evidence in legal proceedings. Therefore, the techniques used to extractdigital evidence from devices must comply with legal standards. Reliability is anotherimportant issue for forensic examinations.There are many approaches or measures to help the forensic analysis of anincident toprove or disprove the occurrence of a crime. Seunghee et al [Seunghee2008] makes use of image files of Web URL pages of the same time that is recorded inthe log files and made to properly document the evidence of a crime. These log entriesserve as traces of digital evidence of the crime. Ahmed and Hussain proposed [Ahmed2009] an automated approach to log Web URLs for forensic analysis using a usertransparent approach to log the Web URLs visited by the user. Storing the log activity toa hidden default location which is defined by the system variables and storing on alogging server helps maintain the log of the visited URLs even if the user deletes thelocal log activity. The approach proposed by Lin et al [Lin 2008] is a forensic system thatextracts timestamps and any other clue of the events that can be found in the log file. 9Commercial ware is software that is the property of the producers and can bebought for use and such products are considerably safe and virus free to use, in most Cases. Freeware or shareware are free to use software but may have bugs in them orunchecked dependencies. Security is the responsibility of the user in this case. The piracyof files has become a threat to the copyrights and confidentiality of information on theInternet.

    Campidoglio et al [Campidoglio 2009] describe the problem and suggest DigitalRights Management (DRM) systems to protect the legal rights of the information. Theauthors further explained the various rights, laws against piracy and effects of piracy.Campidoglio et al are confident that the DRM systems surely helps support the legalrights associated with the digital content.The hackers and phish try to circumvent the preventive measures taken by theclient side defensive software. There are many software's which detect malicious sitesand report that to the user. One such approach is proposed by Chou et al by using theusage and history of the user [Chou 2004]. Chou et al say that the hackers and phishovercome the measures to protect a client side system; hence a detailed study in the fieldis necessary to find a way to get a permanent solution for hacking and phishing.In Web forensics, new techniques are developedby structuring and analysing theexisting techniques and using the overlooked content in the Web data [Sen 2006]. A well-structured hierarchical methodology is proposed. The click records are analysed andintegrated with the Web data warehousing. These data used to study the visit behaviour and track and understand the user's decision on buying a product. Hence loyal customerscan be identified [Sen 2006].


    Michael et al [Micheal 2009] proposed a tool to reconstruct the browsing activitylike a slideshow unlike some of the tools which parse the cache into URLs. Web CacheIlluminator and IE History & Cache Viewer are the tools which can parse the cache toURLs. This tool can effectively show the intension of the user because the investigatorcan visually inspect the activity and help in deeper understanding of the activity andspecific intensions of the user. But the drawbacks of this tool are that the client sidescripting interaction is totally lost. Whenever the URL is accessed, the older version ofWeb page is lost and is overwritten. There are some compatibility problems with the CSSand AJAX technology files. [Micheal 2009]Murillo et al explains how IE Browser deletes the history and ways to recover thedeleted history. A detailed Firefox forensic analysis is also presented using forensicutility tools. Murillo et al proposes an algorithm to recover deleted SQLite entries basedon known internal record structures. [Murillo 2009]