Recently I was in front of a pickle. Like many of us I wished a log about my system drive reads and writes or IO operations.
For Linux there are some good tools for overall and in depth monitoring of your system (like iotop, vnstat, etc…) but I wanted something small and simple.
So I decided to write my own. (Full code @ the end of the post)

The python script works on both Linux and Windows but you have to tweak the code for your specific needs.

Prerequisite:

  1. Python 3 installation
  2. Linux platform (recommended).
  3. Python 3 pip

First we need psutil libary (For more info about psutil visit: https://pypi.python.org/pypi/psutil)

Open terminal and install psutil:

pip3 install psutil

or

pip install psutil

For some systems pip is used for Python 3 and on some you need to use pip3 (Ubuntu, Mint).

Lets start coding:

dicts = psutil.disk_io_counters(perdisk=True)

This line give the dictionary dicts value from psutil.disk_io_counters function. This function returns system wide disk I/O statistics as namedtuple. If perdisk is True return the same information for every physical disk installed on the system as a dictionary with partition names as the keys and the namedtuple described above as the values. The full output from disk_io_counters with perdisk True flag looks something like this:

{‘sda1’: sdiskio(read_count=95078, write_count=2319, read_bytes=11342775296, write_bytes=21368832, read_time=301432, write_time=151244, read_merged_count=6, write_merged_count=2898, busy_time=216924), ‘ram11’: sdiskio(read_count=0, write_count=0, read_bytes=0, write_bytes=0, read_time=0, write_time=0, read_merged_count=0, write_merged_count=0, busy_time=0), ‘sdc5’: sdiskio(read_count=76, write_count=0, read_bytes=3325952, write_bytes=0, read_time=168, write_time=0, read_merged_count=0, write_merged_count=0, busy_time=168), ‘sdb1’: sdiskio(read_count=81, write_count=0, read_bytes=3219456, write_bytes=0, read_time=372, write_time=0,write_time=0, read_merged_count=0, write_merged_count=0, busy_time=1200), ‘ram5’: sdiskio(read_count=0, write_count=0, read_bytes=0, write_bytes=0, read_time=0, write_time=0, read_merged_count=0, write_merged_count=0, busy_time=0), ‘sdb2’: sdiskio(read_count=77, write_count=0, read_bytes=3203072, write_bytes=0, read_time=340, write_time=0, read_merged_count=0, write_merged_count=0, busy_time=308) …….

 

dictshdd = dicts['sdc1']

Here the namedtuple dictshdd receives the values for the sdc1 disk from dicts libary. Now the sdc1 is my system drive. For overview of your drives, open terminal and type:  df -h 

Now the dictshdd contains the following info:

sdiskio(read_count=45999, write_count=72782, read_bytes=1977201664, write_bytes=3958788096, read_time=56916, write_time=164924, read_merged_count=1856, write_merged_count=82498, busy_time=42124)

readb = dictshdd[2]
writeb = dictshdd[3]
busyt = dictshdd[8]

Because we’re interested in bytes written, read and busy time,  lets define some variables. Writeb will be bytes written and this is third value from dictshdd namedtuple. We do the same with readb as byres read and busyt as time busy. The program reads at the start the current values and from there starts calculating, in the loop, how much data has been read and written in set time interval.

logging = open("hddstats.csv","a+")
logging.write("Date, Time, Written MB, Read MB, Busy Time(ms), Writen Total MB, Read Total MB, Busy Time Total (ms)" + "\n")
logging.close

Declared the variable logging to open a file named hddstats.csv. Open takes 2 arguments, the file that we want to open and a string that represents the kinds of permission or operation we want to do on the file. Here we use “a” letter in our argument, which indicates append and the plus sign that means it will create a file if it does not exist in library. The available option beside “a” are “r” for read and “w” for write and plus sign means if it is not there then create it.

With logging.write command we write the string to file. These will be our column headers.

ts = datetime.datetime.fromtimestamp(time.time()).strftime('%d.%m.%Y,%H:%M:%S')

ts variable will receive the date and time info plus some formatting. Notice the comma between Y and %, this separates the time and date fields in our csv file.

dicts = psutil.disk_io_counters(perdisk=True)

dictshdd = dicts['sdc1']

writebnew = dictshdd[3]
readbnew = dictshdd[2]
busytnew = dictshdd[8]

writebcy = writebnew - writeb
readbcy = readbnew - readb
busytcy = busytnew - busyt

writeb = writebnew
readb = readbnew
busyt = busytnew

Now in this part of the loop we take the current values of the drive parameters, subtract the old from the new and get the amount  of change in given time. After that we update the old value. These calculations are necessary because disk_io_counters returns only total values and if we wish to measure the amount in time, we need the starting value and ending value.

logging = open("hddstats.csv","a+")
logging.write(ts + "," + str(writebcy/1048576) + "," + str(readbcy/1048576) + "," + str(busytcy) + "," + str(writebnew/1048576) + "," + str(readbnew/1048576) + "," + str(busytnew) + "\n")
logging.close

When we have all our variables we can write them to file, but not directly. First we divide the bytes by 1048576 and get the Megabytes. Now the result is float and float can’t be directly written to file in our instance so we will convert the float to string by command str().

We form our string starting with ts which is date and time and continue  with “,” between values and end the string with new line “\n”.
All we have to do is add delay to our loop and we’re all done.

File output should look something like this:

Date, Time, Written MB, Read MB, Busy Time(ms), Writen Total MB, Read Total MB, Busy Time Total (ms)
23.12.2016,18:08:33,0.0,0.0,0,3883.140625,1895.0517578125,43276
23.12.2016,18:18:33,10.359375,0.0,128,3893.5,1895.0517578125,43404
23.12.2016,18:28:33,57.046875,0.328125,564,3950.546875,1895.3798828125,43968
23.12.2016,18:38:33,57.83203125,11.37109375,664,4008.37890625,1906.7509765625,44632
23.12.2016,18:48:34,54.03125,27.94921875,820,4062.41015625,1934.7001953125,45452
23.12.2016,18:58:34,142.6953125,28.37109375,1568,4205.10546875,1963.0712890625,47020
23.12.2016,19:08:34,81.23828125,15.58203125,864,4286.34375,1978.6533203125,47884

 

And here is the full code:

import psutil
import time
import datetime

# Insert psutil disk io info to variable dicts
dicts = psutil.disk_io_counters(perdisk=True)

# From dicts dictonary insert sdc1(hdd) data to dictshdd
dictshdd = dicts['sdc1']

# Define some variables
readb = dicts['sdc1'][2]
writeb = dicts['sdc1'][3]
busyt = dicts['sdc1'][8]

# Open file for writing/appending and write column headers.
logging = open("hddstats.csv","a+")
logging.write("Date, Time, Written MB, Read MB, Busy Time(ms), Writen Total MB, Read Total MB, Busy Time Total (ms)" + "\n")
logging.close


while (1):

    time.sleep(60)
    ts = datetime.datetime.fromtimestamp(time.time()).strftime('%d.%m.%Y,%H:%M:%S')

    dicts = psutil.disk_io_counters(perdisk=True)

    dictshdd = dicts['sdc1']

    writebnew = dicts['sdc1'][2]
    readbnew = dicts['sdc1'][3]
    busytnew = dicts['sdc1'][8]

    writebcy = writebnew - writeb
    readbcy = readbnew - readb
    busytcy = busytnew - busyt

    writeb = writebnew
    readb = readbnew
    busyt = busytnew


    logging = open("hddstats.csv","a+")
    logging.write(ts + "," + str(writebcy/1048576) + "," + str(readbcy/1048576) + "," + str(busytcy) + "," + str(writebnew/1048576) + "," + str(readbnew/1048576) + "," + str(busytnew) + "\n")
    logging.close