How to use python scripts for forensics?(Full practical explanation)

sn0xsharma
10 min readMay 29, 2021

--

Who am i ? hey guys my name is sn0x. I am 20 year old cyber security researcher |Bug hunter |machine learning |RHCSA|AWS |CEH |eWPTXv2 certified.

Content

1. What is forensic investigation?

2. Forensic Algorithms

3. Python for forensics

4. python basics you need to know

5. Memory Forensic

6. Network Forensics

7. Mobile Forensics

8. Windows and Linux Forensics

9. Recover deleted data from Recyle bin

10. PDFs and MicrosoftDocs metadata forensics

what is forensic investigation?

Digital forensics is the process of uncovering and interpreting electronic data. The goal of the process is to preserve any evidence in its most original form while performing a structured investigation by collecting, identifying, and validating the digital information to reconstruct past events.

The context is most often for the usage of data in a court of law, though digital forensics can be used in other instances.

Forensic Algorithms

In this section, i will describe the main differences between MD5, SHA256, and SSDEEP — the most common algorithms used in the forensic investigations.

The following Python code reads the data only once and feeds it into two hash calculations. Therefore, this Python script is almost twice as fast as running md5sum followed by sha256sum and produces exactly the same hash sums as these tools:

In the following call of the script, we calculate the hash sums of some of the common Linux tools:

sn0x@security:~$ python multihash.py /bin/{bash,ls,sh} — — — — — MD5 sums — — — — — d79a947d06958e7826d15a5c78bfaa05 /bin/bash fa97c59cc414e42d4e0e853ddf5b4745 /bin/ls c01bc66da867d3e840814ec96a137aef /bin/sh — — — — — SHA256 sums — — — — — cdbcb2ef76ae464ed0b22be346977355c650c5ccf61fef638308b8da60780bdd /bin/ bash 846ac0d6c40d942300de825dbb5d517130d8a0803d22115561dcd85efee9c26b /bin/ls e9a7e1fd86f5aadc23c459cb05067f49cd43038f06da0c1d9f67fbcd627d622c /bin/sh

Once the full image is copied, its contents should be indexed and the hash sums should be created for every file. With the support of the previously defined multi_hash function and Python standard libraries, a report template containing a list of all file names, sizes, and hash values can be created, as shown in the following:

above Python script is all it takes to generate the integrity information of a directory tree that includes file sizes, file names, and hash sums (SHA256, MD5). The following is an example call on our scripting directory:

python for forensics

python is hackers language with its decreased complexity,inscreased efficiency,limitless third-party libraries.If you are running MAC0S or linux,odds tools already exists,learning python can help you with the difficult cases where those tools fail.

python basics you need to know

You can prefer mine recent blog {https://sanketsharma9510.medium.com/automation-using-python-in-bug-bountys-full-practical-explanation-e1e694c43f78} Here i already explain python basics in detail and if you want learn more about python you can also prefer one of mine fav. learning platform {https://www.w3schools.com/python/default.asp}

Memory Forensic

In this section,I will show you how to investigate in volatile memory with the help of Volatility, a Python-based forensics framework, on the platforms like Android or linux.

Reconstructing data for Android

Now, we will see how to reconstruct application data with the help of Volatility and custom made plugins. Therefore, we have chosen the call history and keyboard cache. If you are investigating on a common Linux or Windows system, there is already a large amount of plugins that are available, as you will see in the last section of this chapter. Unfortunately, on Android, you have to write your own plugins.

Call history

One of our goals is to recover the list of recent incoming and outgoing phone calls from an Android memory dump. This list is loaded when the phone app is opened. The responsible process for the phone app and call history is com.android. contacts. This process loads the PhoneClassDetails.java class file that models the data of all telephone calls in a history structure. One instance of this class is in memory per history entry. The data fields for each instance are typical meta information of a call.

so we going write script to recover things like :Type (incoming, outgoing, or missed,Assigned photo of the contact,Telephone number,Contact name ,Date and time

To automatically extract and display this metadata, we provide a Volatility plugin called dalvik_app_calllog, which is shown as follows:

Now This plugin accepts the following command line parameters: • -o: For an offset to the gDvm object • -p: For a process ID (PID) • -c: For an offset to the PhoneClassDetails class If some of these parameters are known and passed on to the plugin, the runtime of the plugin reduces significantly. Otherwise, the plugin has to search for these values in RAM itself.

Network Forensics

Now guys in this part, we will focus on the parts of the forensic investigation that are specific to the network layer. We will choose one of the most widely used Python packages for the purpose of manipulating and analyzing network traffic (Scapy). (one of mine fav too)

Also i am here to cover topics like :

How to build your own port scanner ?

Why i am using Scapy ?

Scapy is a powerful interactive packet manipulation program. It is able to forge or decode packets of a wide number of protocols, send them on the wire, capture them, match requests and replies, and much more.

Also Another great Python-based tool to analyze and manipulate the network traffic is Scapy. According to the developer website, http://www.secdev.org/projects/ scapy/:

Scapy differs from the standard tools (and also from Dshell) by providing an investigator with the ability to write small Python scripts that can manipulate or analyze the network traffic — either in a recorded form or in real-time.

How to build your own port scanner :

Bonus script from my side : now we going to write script to printed table with all the IP addresses that are online and also their corresponding MAC addresses.

Mobile Forensics

Nowadays, mobile device use is as pervasive as it is helpful, especially in the context of digital forensics, because these small-sized machines amass huge quantities of data on a daily basis, which can be extracted to facilitate the investigation. Being something like a digital extension of ourselves, these machines allow digital forensic investigators to glean a lot of information.

The first step is getting root access to the smartphone

After getting the root access, the next step is trying to get the screen lock in plain text as this secret is often used for different protections (for example, the screen lock can be used as an application password for an app on the phone).

Breaking the screen lock for a PIN or password can be done with the following script:

This script generates a file called crack.hash that can be used to feed hashcat to brute force the screen lock. If the smartphone owner has used a 4-digit PIN, the command to execute hashcat.

If the smartphone user has used a gesture to unlock the smartphone, you can use a pre-generated rainbow table and the following script:

At the end of this section, i will show you how to gather more details about the usage of the android-based smartphone. In the following example, we will use the contacts database that also stores the phone call history. This example can easily be adopted to get calendar entries or content from any other database of an app that is installed on the device.

Windows and Linux Forensics

In this section we will focus on the parts of the forensic investigation that are specific to the operating systems.For both operating systems, we selected examples of interesting evidence and how to automate its analysis using Python. Consequently, in this section, you will learn the following:

1.Understanding, using, and parsing Linux file metadata with POSIX ACL and file based capabilities as the most prominent extensions to the standard metadata.

2.Analyzing the foundations of the Windows event log, selecting interesting parts, and automatically parsing them

Reading basic file metadata with Python

Python provides built-in functionality to read the file status information with the os module. The standard function to retrieve metadata from a file that is specified by its name is os.lstat(). In contrast to the more commonly used os.stat(), this function does not evaluate the targets of symbolic links but retrieves the information about the link itself. Therefore, it is not prone to run into infinite loops that are caused by circular symbolic links. Furthermore, it does not cause any errors on links that lack the link target.

The st_mtime, st_atime, and st_ctime time stamps are specified in the Unix timestamp format, that is, the number of seconds since January 1st 1970. With the datetime module, this time format can be converted into a human readable form, using the following script:

the Python library pylibacl can be used to read and evaluate POSIX ACLs and hence, avoid that pitfall. The library introduces the posix1e module, that is, a reference to the initial draft first mentioning POSIX ACLs. The detailed documentation about this library is available at http://pylibacl.k1024.org/.

Using Python’s ctypes, the shared libcap.so.2 library can be utilized to retrieve all the file capabilities from a directory tree.

Windows :

We want to start with a basic conversion of the binary XML format of EVTX files to the readable XML files. This can be done using evtxdump.py, https://github.com/williballenthin/python-evtx, which will also be the basis of our following scripts

In Windows Event Log file, the output will be very large as you will find all the recorded logs in the generated XML file. For an analyst, it is often important to perform a fast triage or search for specific events quickly. Due to this, we modify the script in a way that it is possible to extract only specific events. so now lets write script ;)

Recover deleted data from Recyle bin

Microsoft Operating Systems, the Recycle Bin serves as a special folder that contains deleted files. When a user deletes files via Windows Explorer, the operating system places the files in this special folder, marking them for deletion but not actually removing them. On Windows 98 and prior systems with a FAT file system, the C:\Recycled\ directory holds the Recycle Bin directory. Operating systems that support NTFS, including Windows NT, 2000, and XP, store the Recycle Bin in the C:\Recycler\ directory. Windows Vista and 7 store the directory at C:\$Recycle.Bin.

we will want to know who deleted which files in the Recycle Bin, let’s write a small function to translate each SID into a username. This will allow us to print some more useful output when we recover deleted items in the Recycle Bin.

Finally, we will put all of our code together to create a script that will print the deleted files still in the Recycle Bin

PDFs and microsoftDocs metadata forensics

we have a tool that can identify the metadata embedded in a PDF document. Similarly, we can modify our script to test for specific metadata, such as a specific user

Running our pdfReader script against the Anonymous Press Release, we see the same metadata,

forensic:∼# python pdfRead.py -F ANONOPS_The_Press_Release.pdf

[*] PDF MetaData For: ANONOPS_The_Press_Release.pdf

[+] /Author:Alex Tapanaris [+] /Producer:OpenOffice.org 3.2

[+] /Creator:Writer [+] /CreationDate:D:20101210031827+02'00'

follow me on

Twitter = sn0x

Instagram = sn0x

GitHub = sn0x-sharma

and if really you like this blog you can support me,

To support me {https://www.buymeacoffee.com/sn0xsharma}

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

--

--

sn0xsharma
sn0xsharma

Written by sn0xsharma

Security Researcher | CNSP | EWPTXv2 | CEHV11-12 | GRC | RHCSA | HOF : Tesla (2021) | HTB Top 200 (2021) | THM Global #2 & #1 in india(2024)

Responses (1)

Write a response