December 30, 2009

New Year’s Python Meme

What’s the coolest Python application, framework or library you have discovered in 2009?

Multiprocessing, which was included in the standard library starting with Python 2.6. It is uses the same API as the threading module, but spawns separate OS processes, rather than threads. You can use it as a basis to build some very powerful concurrent programs that fully utilize multi-core systems.

What new programming technique did you learn in 2009?

Better concurrency with threads and processes. Up until a few years ago, I mostly dealt with uniprocessor systems. 1 processor, 1 core, concurrency was simple since you didn't have to think about it beyond using threads for non-blocking IO. Now everything is multi-way and multi-core and you have to think about true concurrency. Python gives you some primitives to deal with that, and I have been poking around and trying to learn what I can so I can build concurrent and scalable systems.

I also started dabbling in GUI programming with the new ttk (Tk Themed Widgets) in Python 2.7/3.1.

What’s the name of the open source project you contributed the most in 2009? What did you do?

www.pylot.org

Pylot is a performance/load testing tool for http and web services. I am the author and maintainer.

www.goldb.org
corey-projects

I have also been releasing snippets of open source code at my new Google code repository and on my web site.

What was the Python blog or website you read the most in 2009?

Planet Python gets you most everything in the Python world. But if I had to pick, I'd say my favorite Python blogs are from Jesse Noller and Michael Foord (Voidspace). Both of them write about topics that I am very interested in. (They are both Python Core maintainers).

I also like the regular geek sites: Reddit Programming, Slashdot, Digg. Then there is a big crew of Python developers on Twitter that are fun to follow (follow me: @cgoldberg).

I also really got into StackOverflow this year. It is a really useful question/answer site. The wealth of Python information on there is staggering. I mean, where else can you ask a Python question and have people like Alex Martelli answering it 5 mins later? ... with others fighting to provide even better answers. It's great!

What are the three top things you want to learn in 2010?

- Selenium/WebDriver 2.0 - the new WebDriver browser driver and API. The Selenium guys have been doing a lot of great work and this is becoming the industry standard tool for UI testing of browser based apps. 2.0 is a big change from 1.x and is the merger of 2 projects: Selenium and WebDriver.

- Python 3.1 - I have already been writing some code in Py3k, and to be honest it's barely any different than 2.x. It feels right at home, with some minor syntactical changes and reorganization of some standard library modules.

- Better copy writing - I produce lots of code in the form of snippets or larger projects. Most of these often get lost because I don't have the words to go along with them. If I was a better writer and could express myself better, I would be able to post more often and get my ideas out there.

December 17, 2009

Python - PyQt4 - Hello World

basic "Hello World" GUI application using Python and Qt (PyQt4)

#!/usr/bin/env python
 
import sys
from PyQt4 import Qt

app = Qt.QApplication(sys.argv)
lbl = Qt.QLabel('Hello World')
lbl.show()
app.exec_()

December 16, 2009

Python - Send Email With smtplib

This is mostly just for my own reference

sending an email using Python's smtplib:

import smtplib
from email.MIMEText import MIMEText

smtp_server = 'exchangeserver.foo.com'
recipients = ['corey@foo.com',]
sender = 'corey@foo.com'
subject = 'subject goes here'
msg_text = 'message body goes here'


msg = MIMEText(msg_text)
msg['Subject'] = subject
s = smtplib.SMTP()
s.connect(smtp_server)
s.sendmail(sender, recipients, msg.as_string())
s.close()

Book Sample: Matplotlib for Python Developers - Plotting Data

Today's post is a sample from the new book: "Matplotlib for Python Developers" by Sandro Tosi. I will review the book in an upcoming post.

-Corey

Disclosure: Packt Publishing sent me a free copy of this book to review.

Matplotlib for Python Developers

http://www.packtpub.com/matplotlib-python-development/book

In this two-part article, by Sandro Tosi you will see several examples of Matplotlib usage in real-world situations, showing some of the common cases where we can use Matplotlib to draw a plot of some values.

There is a common workflow for this kind of job:

Identify the best data source for the information we want to plot
Extract the data of interest and elaborate it for plotting
Plot the data

Usually, the hardest part is to extract and prepare the data for plotting. Due to this, we are going to show several examples of the first two steps.

The examples are:

Plotting data from a database
Plotting data from a web page
Plotting the data extracted by parsing an Apache log file
Plotting the data read from a comma-separated values (CSV) file
Plotting extrapolated data using curve fitting
Third-party tools using Matplotlib (NetworkX and mpmath)

Let's begin

Plotting data from a database

Databases often tend to collect much more information than we can simply extract and watch in a tabular format (let's call it the "Excel sheet" report style).

Databases not only use efficient techniques to store and retrieve data, but they are also very good at aggregating it.

One suggestion we can give is to let the database do the work. For example, if we need to sum up a column, let's make the database sum the data, and not sum it up in the code. In this way, the whole process is much more efficient because:

There is a smaller memory footprint for the Python code, since only the aggregate value is returned, not the whole result set to generate it
The database has to read all the rows in any case. However, if it's smart enough, then it can sum values up as they are read
The database can efficiently perform such an operation on more than one column at a time

The data source we're going to query is from an open source project: the Debian distribution. Debian has an interesting project called UDD , Ultimate Debian Database, which is a relational database where a lot of information (either historical or actual) about the distribution is collected and can be analyzed.

On the project website http://udd.debian.org/, we can fi nd a full dump of the database (quite big, honestly) that can be downloaded and imported into a local PostgreSQL instance (refer to http://wiki.debian.org/UltimateDebianDatabase/CreateLocalReplica for import instructions

Now that we have a local replica of UDD, we can start querying it:

# module to access PostgreSQL databases
import psycopg2
# matplotlib pyplot module
import matplotlib.pyplot as plt

Since UDD is stored in a PostgreSQL database, we need psycopg2 to access it. psycopg2 is a third-party module available at http://initd.org/projects/psycopg

# connect to UDD database
conn = psycopg2.connect(database="udd")
# prepare a cursor
cur = conn.cursor()

We will now connect to the database server to access the udd database instance, and then open a cursor on the connection just created.

# this is the query we'll be making
query = """
select to_char(date AT TIME ZONE 'UTC', 'HH24'), count(*)
from upload_history
where to_char(date, 'YYYY') = '2008'
group by 1
order by 1"""

We have prepared the select statement to be executed on UDD. What we wish to do here is extract the number of packages uploaded to the Debian archive (per hour) in the whole year of 2008.

date AT TIME ZONE 'UTC': As date field is of the type timestamp with time zone, it also contains time zone information, while we want something independent from the local time. This is the way to get a date in UTC time zone.
group by 1: This is what we have encouraged earlier, that is, let the database do the work. We let the query return the already aggregated data, instead of coding it into the program.

# execute the query
cur.execute(query)
# retrieve the whole result set
data = cur.fetchall()

We execute the query and fetch the whole result set from it.

# close cursor and connection
cur.close()
conn.close()

Remember to always close the resources that we've acquired in order to avoid memory or resource leakage and reduce the load on the server (removing connections that aren't needed anymore).

# unpack data in hours (first column) and
# uploads (second column)
hours, uploads = zip(*data)

The query result is a list of tuples, (in this case, hour and number of uploads), but we need two separate lists—one for the hours and another with the corresponding number of uploads. zip() solves this with *data, we unpack the list, returning the sublists as separate arguments to zip(), which in return, aggregates the elements in the same position in the parameters into separated lists. Consider the following example:

In [1]: zip(['a1', 'a2'], ['b1', 'b2'])
Out[1]: [('a1', 'b1'), ('a2', 'b2')]

To complete the code:

# graph code
plt.plot(hours, uploads)
# the the x limits to the 'hours' limit
plt.xlim(0, 23)
# set the X ticks every 2 hours
plt.xticks(range(0, 23, 2))
# draw a grid
plt.grid()
# set title, X/Y labels
plt.title("Debian packages uploads per hour in 2008")
plt.xlabel("Hour (in UTC)")
plt.ylabel("No. of uploads")

The previous code snippet is the standard plotting code, which results in the following screenshot:

From this graph we can see that in 2008, the main part of Debian packages uploads came from European contributors. In fact, uploads were made mainly in the evening hours (European time), after the working days are over (as we can expect from a voluntary project).

Plotting data from the Web

Often, the information we need is not distributed in an easy-to-use format such as XML or a database export but for example only on web sites.

More and more often we find interesting data on a web page, and in that case we have to parse it to extract that information: this is called web scraping .

In this example, we will parse a Wikipedia article to extracts some data to plot. The article is at http://it.wikipedia.org/wiki/Demografia_d'Italia and contains lots of information about Italian demography (it's in Italian because the English version lacks a lot of data); in particular, we are interested in the population evolution over the years.

Probably the best known Python module for web scraping is BeautifulSoup ( http://www.crummy.com/software/BeautifulSoup/). It's a really nice library that gets the job done quickly, but there are situations (in particular with JavaScript embedded in the web page, such as for Wikipedia) that prevent it from working.

As an alternative, we find lxml quite productive (http://codespeak.net/lxml/). It's a library mainly used to work with XML (as the name suggests), but it can also be used with HTML (given their quite similar structures), and it is powerful and easy–to-use.

Let's dig into the code now:

# to get the web pages
import urllib2
# lxml submodule for html parsing
from lxml.html import parse
# regular expression module
import re
# Matplotlib module
import matplotlib.pyplot as plt

Along with the Matplotlib module, we need the following modules:

urllib2: This is the module (from the standard library) that is used to access resources through URL (we will download the webpage with this).
lxml: This is the parsing library.
re: Regular expressions are needed to parse the returned data to extract the information we need. re is a module from the standard library, so we don't need to install a third-party module to use it.

# general urllib2 config
user_agent = 'Mozilla/5.0 (compatible; MSIE 5.5; Windows NT)'
headers = { 'User-Agent' : user_agent }
url = "http://it.wikipedia.org/wiki/Demografia_d'Italia"

Here, we prepare some configuration for urllib2, in particular, the user_agent header is used to access Wikipedia and the URL of the page.

# prepare the request and open the url
req = urllib2.Request(url, headers=headers)
response = urllib2.urlopen(req)

Then we make a request for the URL and get the HTML back.

# we parse the webpage, getroot() return the document root
doc = parse(response).getroot()

We parse the HTML using the parse() function of lxml.html and then we get the root element. XML can be seen as a tree, with a root element (the node at the top of the tree from where every other node descends), and a hierarchical structure of elements.

# find the data table, using css elements
table = doc.cssselect('table.wikitable')[0]

We leverage the structure of HTML accessing the first element of type table of class wikitable because that's the table we're interested in.

# prepare data structures, will contain actual data
years = []
people = []

Preparing the lists that will contain the parsed data.

# iterate over the rows of the table, except first and last ones
for row in table.cssselect('tr')[1:-1]:

We can start parsing the table. Since there is a header and a footer in the table, we skip the first and the last line from the lines (selected by the tr tag) to loop over.

# get the row cell (we will use only the first two)
data = row.cssselect('td')

We get the element with the td tag that stands for table data: those are the cells in an HTML table.

# the first cell is the year
tmp_years = data[0].text_content()
# cleanup for cases like 'YYYY[N]' (date + footnote link)
tmp_years = re.sub('[.]', '', tmp_years)

We take the first cell that contains the year, but we need to remove the additional characters (used by Wikipedia to link to footnotes).

# the second cell is the population count
tmp_people = data[1].text_content()
# cleanup from '.', used as separator
tmp_people = tmp_people.replace('.', '')

We also take the second cell that contains the population for a given year. It's quite common in Italy to separate thousands in number with a '.' character: we have to remove them to have an appropriate value.

# append current data to data lists, converting to integers
years.append(int(tmp_years))
people.append(int(tmp_people))

We append the parsed values to the data lists, explicitly converting them to integer values.

# plot data
plt.plot(years,people)
# ticks every 10 years
plt.xticks(range(min(years), max(years), 10))
plt.grid()
# add a note for 2001 Census
plt.annotate("2001 Census", xy=(2001, people[years.index(2001)]),
xytext=(1986, 54.5*10**6),
arrowprops=dict(arrowstyle='fancy'))

Running the example results in the following screenshot that clearly shows why the annotation is needed:

In 2001, we had a national census in Italy, and that's the reason for the drop in that year: the values released from the National Institute for Statistics (and reported in the Wikipedia article) are just an estimation of the population. However, with a census, we have a precise count of the people living in Italy.

Plotting data by parsing an Apache log file

Plotting data from a log file can be seen as the art of extracting information from it.

Every service has a log format different from the others. There are some exceptions of similar or same format (for example, for services that come from the same development teams) but then they may be customized and we're back at the beginning.

The main differences in log files are:

Fields orders: Some have time information at the beginning, others in the middle of the line, and so on
Fields types: We can find several different data types such as integers, strings, and so on
Fields meanings: For example, log levels can have very different meanings

From all the data contained in the log file, we need to extract the information we are interested in from the surrounding data that we don't need (and hence we skip).

In our example, we're going to analyze the log file of one of the most common services: Apache. In particular, we will parse the access.log file to extract the total number of hits and amount of data transferred per day.

Apache is highly configurable, and so is the log format. Our Apache configuration, contained in the httpd.conf file, has this log format:

"%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i""

In this way, we extract the day of the request along with the response size from every line.

# prepare dictionaries to contain the data
day_hits = {}
day_txn = {}

These dictionaries will store the parsed data.

# we open the file
with open('<location of the Apache>/access.log') as f:
# and for every line in it
for line in f:

we open the file (we have to select the proper location of access.log as it differs between operating systems and/or installation), and for every line in it

# we pass the line to regular expression
m = apa_line.match(line)
# and we get the 2 values matched back
day, call_size = m.groups()

We parse the line and take the resulting values.

# if the current day is already present
if day in day_hits:
# we add the call and the size of the request
day_hits[day] += 1
day_txn[day] += int(call_size)

else if the current day is already present in the dictionaries, then add the hit and the size to the respective dictionaries.

else:
# else we initialize the dictionaries
day_hits[day] = 1
day_txn[day] = int(call_size)

If the current day is not present, then we need to initialize the dictionaries for a new day.

# prepare a list of the keys (days)
keys = sorted(day_hits.keys())

We prepare a sorted list of dictionary keys, since we need to access it several times.

# prepare a figure and an Axes in it
fig = plt.figure()
ax1 = fig.add_subplot(111)

We prepare a Figure and an Axes in it.

# bar width
width = .4

We define the bars width.

# for each key (day) and it's position
for i, k in enumerate(keys):

Then we enumerate the item in keys so that we have a progressive number associated with each key.

# we plot a bar
ax1.bar(i - width/2, day_hits[k], width=width, color='y')

Now we can plot a bar at position i (but shifted by width/2 to center the tick) with its height proportional to the number of hits, and width set to width. We set the bar color to yellow.

# for each label for the X ticks
for label in ax1.get_xticklabels():
# we hide it
label.set_visible(False)

We hide the labels on the X-axis to avoid its superimposition with the other Axes labels (for transfer size).

# add a label to the Y axis (for the first plot)
ax1.set_ylabel('Total hits')

We set a label for the Y-axis.

# create another Axes instance, twin of the previous one
ax2 = ax1.twinx()

We now create a second Axes, sharing the X-axis with the previous one.

# plot the total requests size
ax2.plot([day_txn[k] for k in keys], 'k', linewidth=2)

We plot a line for the transferred size, using the black color and with a bigger width.

# set the Y axis to start from 0
ax2.set_ylim(ymin=0)

We let the Y-axis start from 0 so that it can be congruent with the other Y-axis.

# set the X ticks for each element of keys (days)
ax2.set_xticks(range(len(keys)))
# set the label for them to keys, rotating and align to the right
ax2.set_xticklabels(keys, rotation=25, ha='right')

We set the ticks for the X-axis (that will be shared between this and the bar plot) and the labels, rotating them by 25 degrees and aligning them to the right to better fit the plot.

# set the formatter for Y ticks labels
ax2.yaxis.set_major_formatter(FuncFormatter(megabytes))
# add a label to Y axis (for the second plot)
ax2.set_ylabel('Total transferred data (in Mb)')

Then we set the formatter for the Y-axis so that the labels are shown in megabytes (instead of bytes).

# add a title to the whole plot
plt.title('Apache hits and transferred data by day

Finally, we set the plot title.

On executing the preceding code snippet, the following screenshot is displayed:

The preceding screenshot tells us that it is not a very busy server, but it still shows what's going on.

Continue Reading Plotting data using Matplotlib: Part 2

December 14, 2009

http_rrd_profiler - HTTP Request Profiler with RRD Storage and Graphing

http_rrd_profiler is an HTTP testing/monitoring tool written in Python. It uses Python's httplib to send a GET request to a web resource and adds performance timing instrumentation between each step of the request/response. RRDTool is used for data storage and graph generation. RRDtool is an open source, high performance logging and graphing system for time-series data.

The RRD graphs it produces:

The code is located in my repository. You will need the following files:
http_rrd_profiler.py
rrd.py

Usage:

http_rrd_profiler.py <host> <interval in secs>

You invoke it from the command like like this:

> python http_rrd_profiler.py www.goldb.org 5

This will send an HTTP GET to http://www.goldb.org/ every 5 seconds.

Timing results are printed to the console and stored in a Round Robin Database (RRD). During each interval, a graph (png image) is generated.

Console output:

----------------
216 request sent
337 response received
456 content transferred (8044 bytes)
----------------
208 request sent
332 response received
453 content transferred (8044 bytes)
----------------
212 request sent
332 response received
452 content transferred (8044 bytes)
----------------
210 request sent
329 response received
448 content transferred (8044 bytes)
----------------
214 request sent
333 response received
452 content transferred (8044 bytes)

* tested on Ubuntu Karmic and WinXP with Python 2.6

December 9, 2009

Python 3 - tkinter (Tk Widgets) - BusyBar Busy Indicator

Here is another example GUI application using Python 3.1.

For this example, I used BusyBar.py, a module for creating a "busy indicator" (like the knight-rider light). To do this, I first had to port BusyBar to Python 3.1 from 2.x. The new code for this module can be found here:

BusyBar.py

It renders on Ubuntu (with Gnome) like this:

Code:

#!/usr/bin/env python
# Python 3


import tkinter
from tkinter import ttk
import BusyBar


class Application:
    def __init__(self, root):
        self.root = root
        self.root.title('BusyBar Demo')
        ttk.Frame(self.root, width=300, height=100).pack()
        
        bb = BusyBar.BusyBar(self.root, width=200)
        bb.place(x=40, y=20)
        bb.on()
    

if __name__ == '__main__':
    root = tkinter.Tk()
    Application(root)
    root.mainloop()

December 8, 2009

Python 3 - tkinter ttk (Tk Themed Widgets) - Blocking Command Example

Here is another example GUI application with ttk in Python 3.1.

This example shows usage of the Button, Text, and Scrollbar widgets, along with some basic layout management using grid().

To use it, enter a URL and click the button. It will send an HTTP GET request to the URL and insert the response headers into the Text widget. You will notice that the GUI is blocked (frozen) while the URL is fetched. This is because all of the work is done in the main event thread. I will show how to use threads for non-blocking events in a future post.

It renders on Ubuntu (with Gnome) like this:

Code:

#!/usr/bin/env python
# Python 3


import tkinter
from tkinter import ttk
import urllib.request



class Application:
    def __init__(self, root):
        self.root = root
        self.root.title('Blocking Command Demo')
        
        self.init_widgets()
            
            
    def init_widgets(self):
        self.btn = ttk.Button(self.root, command=self.get_url, text='Get Url', width=8)
        self.btn.grid(column=0, row=0, sticky='w')
        
        self.entry = ttk.Entry(self.root, width=60)
        self.entry.grid(column=0, row=0, sticky='e')
        
        self.txt = tkinter.Text(self.root, width=80, height=20)
        self.txt.grid(column=0, row=1, sticky='nwes')
        sb = ttk.Scrollbar(command=self.txt.yview, orient='vertical')
        sb.grid(column=1, row=1, sticky='ns')
        self.txt['yscrollcommand'] = sb.set
        

    def get_url(self):
        url = self.entry.get()
        headers = urllib.request.urlopen(url).info()
        self.txt.insert(tkinter.INSERT, headers)
    
    

if __name__ == '__main__':
    root = tkinter.Tk()
    Application(root)
    root.mainloop()

December 7, 2009

Python 3 - tkinter ttk (Tk Themed Widgets) - Button/Text Example

Here is some boilerplate code for setting up a basic GUI application with ttk in Python 3.1.

This example shows usage of the Frame, Button, and Text widgets. When you click the button, "Hello World" is inserted into the Text widget.

#!/usr/bin/env python
# Python 3


import tkinter
from tkinter import ttk


class Application:
    def __init__(self, root):
        self.root = root
        self.root.title('Button Demo')
        ttk.Frame(self.root, width=250, height=100).pack()
        
        self.init_widgets()
            
    def init_widgets(self):
        ttk.Button(self.root, command=self.insert_txt, text='Click Me', width='10').place(x=10, y=10)
        self.txt = tkinter.Text(self.root, width='15', height='2')
        self.txt.place(x=10, y=50)
        
    def insert_txt(self):
        self.txt.insert(tkinter.INSERT, 'Hello World\n')


if __name__ == '__main__':
    root = tkinter.Tk()
    Application(root)
    root.mainloop()

It renders on Ubuntu (with Gnome) like this:

December 6, 2009

Python 3 - tkinter ttk (Tk Themed Widgets) - GUI Hello World

Here is some boilerplate code for setting up a basic GUI application with ttk in Python 3.1:

#!/usr/bin/env python
# Python 3


import tkinter
from tkinter import ttk

class Application:
    def __init__(self, root):
        self.root = root
        self.root.title('Hello World')
        ttk.Frame(self.root, width=200, height=100).pack()
        ttk.Label(self.root, text='Hello World').place(x=10, y=10)

if __name__ == '__main__':
    root = tkinter.Tk()
    Application(root)
    root.mainloop()

This example shows usage of the Frame and Label widgets, and some basic layout management using pack() and place(). It renders on Ubuntu (with Gnome) like this:

December 3, 2009

Python - Accurate Cross Platform Timers - time.time() vs. time.clock()

In Python, you can add simple timers around any action using the clock() or time() functions from the time module. time.clock() gives the best timer accuracy on Windows, while the time.time() function gives the best accuracy on Unix/Linux. You should use whichever one is more appropriate for your system.

Inside the timeit module's source code (in Python's standard library), you can see an example of this. The timer type is specified by the platform:

if sys.platform == "win32":
    # On Windows, the best timer is time.clock()
    default_timer = time.clock
else:
    # On most other platforms, the best timer is time.time()
    default_timer = time.time

This seems like a nice way to create a timer that is accurate on either platform.

Here is an example:

import sys
import time

# choose timer to use
if sys.platform.startswith('win'):
    default_timer = time.clock
else:
    default_timer = time.time

start = default_timer()
# do something
finish = default_timer()
elapsed = (finish - start)

October 23, 2009

Linus on Evolution

Here is a great rant from Linus Torvalds back in 2002 on biological and software evolution.
... from a thread on LKML, summarized here: http://kerneltrap.org/node/11

You know what the most complex piece of engineering known to man in the whole solar system is?

Guess what - it's not Linux, it's not Solaris, and it's not your car.

It's you. And me.

And think about how you and me actually came about - not through any complex design.

Right. "sheer luck".

Well, sheer luck, AND:

free availability and _crosspollination_ through sharing of "source code", although biologists call it DNA.

a rather unforgiving user environment, that happily replaces bad versions of us with better working versions and thus culls the herd (biologists often call this "survival of the fittest")

massive undirected parallel development ("trial and error")

I'm deadly serious: we humans have _never_ been able to replicate something more complicated than what we ourselves are, yet natural selection did it without even thinking.

Don't underestimate the power of survival of the fittest.

And don't EVER make the mistake that you can design something better than what you get from ruthless massively parallel trial-and-error with a feedback cycle. That's giving your intelligence _much_ too much credit.

Linus

October 19, 2009

Python - URL Timer - Web Monitor Utility

In a previous post, I showed how to insert timers between steps in an http request to do some basic profiling. Peter Bengtsson contacted me with some enhancements to the original code. I took his enhancements and added some more of my own to create a neat little web monitoring utility.

It uses Python's httplib to send a GET requests at regular intervals to a URL and adds some timing instrumentation between each step of the request/response. Results are printed to the console.

get it here:
http://code.google.com/p/corey-projects/source/browse/trunk/python2/url_timer.py

to use, give it a site and interval:

> python url_timer.py www.goldb.org 5

... or an https/ssl url and an interval:

> python url_timer.py https://ssl.sonic.net/ 5

sample output:

>python url_timer.py www.goldb.org 5
request sent         response received    content transferred  size
------------         -----------------    -------------------  ----
0.0040 (0.0040)      0.2830 (0.2830)      0.2846 (0.2846)      9810 bytes (0.010 MB total)
0.0009 (0.0024)      0.2790 (0.2810)      0.2803 (0.2825)      9810 bytes (0.020 MB total)
0.0009 (0.0019)      0.2841 (0.2821)      0.2854 (0.2834)      9810 bytes (0.029 MB total)
0.0010 (0.0017)      0.2775 (0.2809)      0.2789 (0.2823)      9810 bytes (0.039 MB total)
0.0008 (0.0015)      0.2794 (0.2806)      0.2820 (0.2822)      9810 bytes (0.049 MB total)
0.0009 (0.0014)      0.2845 (0.2813)      0.2859 (0.2828)      9810 bytes (0.059 MB total)
0.0010 (0.0013)      0.2796 (0.2810)      0.2809 (0.2826)      9810 bytes (0.069 MB total)
0.0009 (0.0013)      0.2798 (0.2809)      0.2808 (0.2823)      9810 bytes (0.078 MB total)
0.0009 (0.0012)      0.2791 (0.2807)      0.2803 (0.2821)      9810 bytes (0.088 MB total)
...

October 5, 2009

Automated Web/HTTP Profiler with Selenium-RC and Python

A small new open source project I am working on:

Selenium-profiler is a web/http profiler built with Selenium-RC and Python. It profiles page load time and network traffic for a web page. The profiler uses Selenium-RC to automate site navigation (via browser), proxy traffic, and sniff the proxy for network traffic stats as requests pass through during a page load.

It is useful to answer questions like:

how many http requests does that page make?
how fast are the http responses coming back?
which http status codes are returned?
how many of each object type are requested?
what is the total page load time?

Get it here:
http://selenium-profiler.googlecode.com/

Sample Output:

--------------------------------
results for http://www.google.com/

content size: 38.06 kb

http requests: 9
status 200: 6
status 204: 1
status 403: 2

profiler timing:
2.063 secs (page load)
2.047 secs (network: end last request)
0.111 secs (network: end first request)

file extensions: (count, size)
gif: 1, 8.558 kb
ico: 2, 2.488 kb
js: 2, 18.046 kb
png: 1, 5.401 kb
unknown: 3, 3.567 kb

http timing detail:
403, GET, /favicon.ico, 111 ms
200, GET, /newkey, 738 ms
200, GET, /, 332 ms
403, GET, /favicon.ico, 3 ms
200, GET, /logo.gif, 255 ms
200, GET, /d9n_Nh4I09g.js, 411 ms
200, GET, /2cca7b2e99206b9c.js, 65 ms
204, GET, /generate_204, 88 ms
200, GET, /nav_logo7.png, 49 ms

September 23, 2009

Selenium RC with Python in 30 Seconds

Selenium is a suite of tools to automate web app testing across many platforms. It has various pieces (Core, RC, IDE, etc), and I struggled trying to figure out how everything fits together and works. At the end of the day, all I wanted to do was use Selenium from my Python code to drive a browser session.

Selenium ships with full tests and some good sample code, but the driver examples all contain test frameworks/runners (JUnit, unittest, etc). All I wanted was a simple way to integrate with Python. To do this, the piece I need is Selenium RC (which includes Selenium Core).

So here is the beginners' 30-second guide to getting the Python client driver working with Selenium-RC:

Install Python (and add it to your path)
Install Java (and add it to your path)
Download Selenium RC
Unzip Selenium RC and search for 'selenium-server.jar' and 'selenium.py'
Copy them to a directory and run 'java -jar selenium-server.jar' to start the server
Start writing your Python driver code! (import selenium)

So what's actually going on when you run a driver script?

Selenium-server is the core program, which also contains an integrated web server. You send HTTP requests to the server to instruct it how to drive your browser. The 'selenium.py' module is just a wrapper around 'httplib' that provides a Python API for interacting with the Selenium-server. You need to import the 'selenium.py' module from your script and then you are ready to go.

now let's test it out and see if it works. Try the following Python script. It should open Firefox and navigate to www.google.com.

#!/usr/bin/env python

from selenium import selenium

sel = selenium('localhost', 4444, '*firefox', 'http://www.google.com/')
sel.start()
sel.open('/')
sel.wait_for_page_to_load(10000)
sel.stop()

September 11, 2009

Python - HTTP Request Profiler

Here is a script in Python that profiles an HTTP request to a web server.

to use, give it a url:


> python http_profiler.py www.goldb.org

output:


0.12454 request sent
0.24967 response received
0.49155 data transferred (9589 bytes)

It uses Python's httplib to send a GET request to the URL and adds some timing instrumentation between each step of the request/response.

Python Code:


#!/usr/bin/env python
# Corey Goldberg - September 2009


import httplib
import sys
import time



if len(sys.argv) != 2:
    print 'usage:\nhttp_profiler.py \n(do not include http://)'
    sys.exit(1)

# get host and path names from url
location = sys.argv[1]
if '/' in location:
    parts = location.split('/')
    host = parts[0]
    path = '/' + '/'.join(parts[1:])
else:
    host = location
    path = '/'

# select most accurate timer based on platform
if sys.platform.startswith('win'):
    default_timer = time.clock
else:
    default_timer = time.time

# profiled http request
conn = httplib.HTTPConnection(host)
start = default_timer()  
conn.request('GET', path)
request_time = default_timer()
resp = conn.getresponse()
response_time = default_timer()
size = len(resp.read())
conn.close()     
transfer_time = default_timer()

# output
print '%.5f request sent' % (request_time - start)
print '%.5f response received' % (response_time - start)
print '%.5f content transferred (%i bytes)' % ((transfer_time - start), size)

September 10, 2009

Web Page Profilers for Windows/Internet Explorer

A "Web Page Profiler" is a tool that is used to measure and analyze web page performance. It is usually implemented as a browser plugin or add-on, and lets you see performance of web pages/objects as they are transferred/loaded/executed/rendered.

For Firefox, choosing a profiler is a no-brainer. Firebug is an excellent developer tool that includes profiling capabilities.

Firebug

Unfortunately, Firebug does not work with Microsoft's Internet Explorer. There is Firebug Lite, which "can" work with IE, but it is limited in functionality and requires you to install some server side code (which is not always feasible).

Firebug Lite

There are some web profilers specifically for IE, but none of them live up to the functionality or stability of Firebug. Some to look at are:

... also, Internet Explorer 8 has built-in "Developer Tools" (press F12) that include a basic performance profiler for JavaScript execution.

Does anyone know of any other profilers/performance tools that work with IE? Thoughts?

August 5, 2009

Segue/Borland/MicroFocus SilkPerformer - Who Owns It?

SilkPerformer is a popular performance and load testing tool. It has a confusing history and has bounced around between companies quite a bit. Here is where it came from and the current state of who owns it.

The history AFAIK:

Segue buys a product from some (scandanavian?) company and names it Segue SilkPerformer.
Borland buys Segue and names it Borland SilkPerformer.
Micro Focus buys Borland and name it Micro Focus SilkPerformer.

Right now it is a little confusing because Borland still lists it as a product. The acquisition of Borland just went through a few days ago and now Micro Focus has it listed on its site also. I would expect the Borland site to eventually disappear.

To make it even more confusing; Micro Focus also bought Compuware which had its own performance/load testing application. So now Micro Focus is marketing QALoad as well as SilkPerformer, which overlap in most functionality.

It will be interesting to see where this goes and what happens to the product lines.

July 20, 2009

[Software] Tools Don't Make You a Mechanic

FYI: Learning LoadRunner does not mean that one has learned Performance Testing. LoadRunner is only a tool. Learning to use one of these:

...does not mean that one can disassemble and reassemble one of these:

[via JakeBrake]

July 10, 2009

Python - Zip Directories Recursively

This helped me out today with some backup scripts. Posting here so I can remember it. Idea and snippet adapted from: http://mail.python.org/pipermail/python-list/2007-February/596539.html

        
#!/usr/bin/env python

import os
import zipfile


def main():
    zipper('c:/test', 'c:/temp/test.zip')


def zipper(dir, zip_file):
    zip = zipfile.ZipFile(zip_file, 'w', compression=zipfile.ZIP_DEFLATED)
    root_len = len(os.path.abspath(dir))
    for root, dirs, files in os.walk(dir):
        archive_root = os.path.abspath(root)[root_len:]
        for f in files:
            fullpath = os.path.join(root, f)
            archive_name = os.path.join(archive_root, f)
            print f
            zip.write(fullpath, archive_name, zipfile.ZIP_DEFLATED)
    zip.close()
    return zip_file


if __name__ == '__main__':
    main()

* code updated. there was a bug in the original I posted (cmg - 07/13/09)

July 7, 2009

OpenSTA - SCL Code Boilerplate for HTTP Load Tests

(small code dump...)

OpenSTA (Open Systems Testing Architecture) is a popular open source web performance test tool. It uses a scripting language named SCL (Script Control Language), which seems to be heavily influenced by Fortran. It's a little bit dated and clumsy to program with, but suffices for writing scripts modeling complex web transactions.

Here is the basic structure I start with when modeling tests in OpenSTA:

!Browser:IE5


Environment
    Description "TEST SCRIPT"
    Mode HTTP
    WAIT UNIT MILLISECONDS


Definitions
    Include         "RESPONSE_CODES.INC"
    Include         "GLOBAL_VARIABLES.INC"
    CONSTANT        DEFAULT_HEADERS = "Host: www.goldb.org^J" &
                        "User-Agent: OpenSTAzilla/4.0"
    Integer         USE_PAGE_TIMERS
    Timer           T_Response
    CHARACTER*32768 logStuff, Local
    CHARACTER*512   USER_AGENT
    CHARACTER*256   MESSAGE


Code
    Entry[USER_AGENT,USE_PAGE_TIMERS]

    Start Timer T_Response

    PRIMARY GET URI "http://www.goldb.org/index.html HTTP/1.0" ON 1 &
        HEADER DEFAULT_HEADERS &
        ,WITH {"Accept: */*", "Accept-Language: en-us"}

    SYNCHRONIZE REQUESTS
    
    DISCONNECT FROM 1
    
    End Timer T_Response
  
Exit


ERR_LABEL:
    If (MESSAGE <> "") Then
        Report MESSAGE
    Endif

Exit

Nothing much to see here. If you use the OpenSTA recorder and record a simple HTTP GET request, it would generate a similar script for you.

Web Performance Tool Evaluation - lower end proprietary tools

I am in the middle of a Performance and Load tools selection process and wanted to get some feedback here.

I currently work in a shop that uses a mix of proprietary and open source tools for web performance & load testing. The bulk of our workload and analysis is currently done using SilkPerformer. As you all probably know, there is a class of tools that is *very* expensive (including SilkPerformer). Installations and maintenance can run into 7 figures ($$$) with yearly maintenance contracts upwards of 6 figures. Since SilkPerformer is in place and we are happy with it (besides price/maintenance), there is no point in moving to a similarly priced tool. Therefore I have ruled out the class of "high end" tools from my selection:

High-end tools
------------------------
Borland/Segue - SilkPerformer
HP/Mercury - LoadRunner
IBM/Rational - IBM Rational Performance Tester
Microfocus/Compuware - QALoad
Oracle/Empirix - Oracle Load Testing For Web Applications (e-Load)

The tool I select will be used across several web applications.. pretty straight forward HTML/AJAX/JavaScript Web UI. Here is a basic list of requirements:

Requirements
------------------------
Protocols:
- HTTP
- ODBC (SQL)

Features:
- distributed load generation
- reporting/analytics
- data driven testing
- 5000+ VU

I work on a very skilled team that is *very* proficient with programming, tools, and web technologies. Adapting to a new tool or programming language is not much of an issue.

I've searched the Open Source landscape pretty good. There are some fantastic tools (OpenSTA, JMeter, Pylot) to augment our testing, but no open source load generation tool completely meets our criteria.

Open Source tools
------------------------
OpenSTA
JMeter
Pylot

Now finally to the question/point....

I am looking at a class of tools that I will call "low-end performance tools". This includes all proprietary tools that are not listed above as "high-end tools". They tend be cheaper and more limited in functionality than the big guns, but are substantially cheaper and sometimes sufficient for complex web performance testing. This is where my interest lies. I have scoured the web and came up with a list of tools to evaluate.

Low-end tools
------------------------
Microsoft - VSTS
Radview - WebLOAD
SoftLogica - WAPT
Facilita - Forecast
Zoho - QEngine
Neotys - NeoLoad

Does anyone have any feedback or experience reports using any of the "low-end" tools listed above? Are there other tools I am overlooking that I should definitely look into?

any comments/suggestions are appreciated.

June 25, 2009

XML-RPC Clients In Python and Perl

I was just writing some XML-RPC code and wanted to post some simple examples of how to talk to an XML-RPC server with some simple client-side code. Here are examples in both Python and Perl.

The examples below show how to connect to an XML-RPC server and call the service's start() method.

a simple XML-RPC client in Python:

#!/usr/bin/env python

import xmlrpclib

host = 'http://localhost'
port = '8888'

server = xmlrpclib.Server('%s:%s' % (host, port))
response = server.start()
print response

a simple XML-RPC client in Perl (using the Frontier-RPC module):

#!/usr/bin/perl -w

use strict;
use Frontier::Client;

my $host = 'http://localhost';
my $port = '8888';

my $server = Frontier::Client->new('url' => "$host:$port");
my $response = $server->call('start');
print $response;

June 24, 2009

Google Calls for a Joint Effort to Speed Up the Internet

check out:
http://code.google.com/speed/

writeup here:
http://www.infoq.com/news/2009/06/Google-Speed-Up-the-Internet

"Google has launched a web site in an attempt to find ways and push the speed up process of the entire Internet. Google shares research data, web site speed optimization tutorials, recorded presentations on performance, links to lots of performance optimization tools, and a discussion group inviting everyone to share ideas on how to make the web faster."

Pylot is listed in the downloads section!

May 14, 2009

Mini Web Load Tester with Python and Pylot Core

Pylot is a performance tool for benchmarking web services/applications. I am working on exposing some of Pylot's internals so you can use it as a Python Module/API for generating concurrent HTTP load.

Below is a simple function that runs a mini [multi-threaded] load test against a single URL. It will return a dictionary containing runtime statistics. Results and timing information from each request is also logged to a file.

I use something like this to run performance unit tests rapidly (10-30 secs usually). I can bang on the URL for my application and quickly see how it performs and scales.

Here is a small Python script that uses Pylot as a module:

#!/usr/bin/env python

import pylot.core.engine as pylot_engine
import os
import sys
import time
        

pylot_engine.GENERATE_RESULTS = False

url = 'http://www.pylot.org'
num_agents = 5
duration = 10
runtime_stats = {}

original_stdout = sys.stdout
sys.stdout = open(os.devnull, 'w')

req = pylot_engine.Request(url)
lm = pylot_engine.LoadManager(num_agents, 0, 0, False, runtime_stats, [])
lm.add_req(req)

lm.start()
time.sleep(duration)
lm.stop()

sys.stdout = original_stdout

for agent_num, stats in runtime_stats.iteritems():
    print 'agent %i : %i reqs : avg %.3f secs' % \
        (agent_num + 1, stats.count, stats.avg_latency)

Output:

agent 1 : 46 reqs : avg 0.220 secs
agent 2 : 46 reqs : avg 0.218 secs
agent 3 : 46 reqs : avg 0.220 secs
agent 4 : 46 reqs : avg 0.221 secs
agent 5 : 46 reqs : avg 0.221 secs

Here is a slightly larger example with some more structure and features. This creates a small command line interface for running a mini load test.

Code:

#!/usr/bin/env python
# Corey Goldberg 2009

import pylot.core.engine as pylot_engine
import os
import sys
import time



def main():
    """
    Usage: >python pylot_mini_loadtest.py   
    """
    url = sys.argv[1]
    num_agents = int(sys.argv[2])
    duration = int(sys.argv[3])
    pylot_engine.GENERATE_RESULTS = False
    print '\nmini web load test \n---------------------------------'
    agent_stats = run_loadtest(url, num_agents, duration)
    throughput = sum([stat.count for stat in agent_stats.values()]) / float(duration)
    print '%.2f reqs/sec' % throughput
    for agent_num, stats in agent_stats.iteritems():
        print 'agent %i : %i reqs : avg %.3f secs' % \
            (agent_num + 1, stats.count, stats.avg_latency)
        


def run_loadtest(url, num_agents, duration):
    """
    Runs a load test and returns a dictionary of statistics from agents.
    """
    original_stdout = sys.stdout
    sys.stdout = open(os.devnull, 'w')
    
    runtime_stats = {}
    req = pylot_engine.Request(url)
    lm = pylot_engine.LoadManager(num_agents, 0, 0, False, runtime_stats, [])
    lm.add_req(req)
    
    lm.start()
    time.sleep(duration)
    lm.stop()
    
    sys.stdout = original_stdout
    
    return runtime_stats



if __name__ == '__main__':
    main()

Usage/Output:

C:\test>python pylot_mini_loadtest.py http://www.goldb.org 8 10

mini web load test
---------------------------------
19.20 reqs/sec
agent 1 : 24 reqs : avg 0.416 secs
agent 2 : 24 reqs : avg 0.418 secs
agent 3 : 24 reqs : avg 0.417 secs
agent 4 : 24 reqs : avg 0.418 secs
agent 5 : 24 reqs : avg 0.415 secs
agent 6 : 24 reqs : avg 0.419 secs
agent 7 : 24 reqs : avg 0.419 secs
agent 8 : 24 reqs : avg 0.419 secs

Of course, for more complex scenarios, you can use the full blown tool, available at: www.pylot.org/download.html

Questions? Hit me up.

May 11, 2009

Pylot - Total Downloads So Far

Here is a graph showing total downloads of Pylot since its first release:

Decent uptake so far. Keep the downloads coming!

Pylot is a web performance testing tool. It is Free Open Source Software.

Python - Redirect or Turn Off STDOUT and STDERR

Here is an easy way to temporarily turn off STDOUT or STDERR in your Python program.

First you create a class to replace STDOUT. This is just minimal class with a 'write()' method.

class NullDevice():
    def write(self, s):
        pass

Notice its 'write()' method does nothing. Therefore, when you write to the NullDevice, output goes nowhere and is dropped. All you need to do is assign sys.stdout to this class.

Here is an example of turning STDOUT off and back on:

#!/usr/bin/env python

import sys


class NullDevice():
    def write(self, s):
        pass


print "1 - this will print to STDOUT"

original_stdout = sys.stdout  # keep a reference to STDOUT

sys.stdout = NullDevice()  # redirect the real STDOUT

print "2 - this won't print"

sys.stdout = original_stdout  # turn STDOUT back on

print "3 - this will print to SDTDOUT"

You can also do the same thing with sys.stderr to turn off STDERR.

May 8, 2009

C# - Export Windows Event Logs

Here is a little C# program to export Windows Event Logs. It reads an Event Log and prints entries to STDOUT so you can pipe the output to a file or other application.

using System;
using System.Diagnostics;


class EventLogExporter
{
    static void Main(string[] args)
    {
        EventLog evtLog = new EventLog("Application");  // Event Log type
        evtLog.MachineName = ".";  // dot is local machine
        
        foreach (EventLogEntry evtEntry in evtLog.Entries)
        {
            Console.WriteLine(evtEntry.Message);
        }
        
        evtLog.Close();
    }
}

May 6, 2009

Dell Mini 10 Netbook with Linux == Graphics FAIL

If you are planning on buying a Dell Mini 10 (or Mini 12) to run Linux, read this...

I used to have the Dell Mini 9 that came with Linux (Ubuntu 8.04). As soon as I got it, I paved it and installed Ubuntu Intrepid instead. It worked like a charm. Then I decided to sell my Mini 9 and upgrade to the Mini 10. The Mini 10 is a better machine in terms of hardware, and is MUCH better in terms of screen resolution and keyboard size (best keyboard on any netbook).

So, the Mini 10 ships with Windows installed. Since I had such good luck with the Mini 9, I figured a Linux install would be a breeze. So with my shiny new Mini 10 netbook, I tried an install Ubuntu Intrepid. It worked great but no compatible graphics driver. OK, so I waited for the Ubuntu Jaunty release and then promptly installed that. Same prob.

Here is the deal: There is no Linux driver for the graphics card it uses (Intel GMA 500). So.. if you want to run Linux on it, your only choice is to run in a non-native resolution using the default driver. This totally sucks.

There appears to be a native Linux driver somewhere (Poulsbo), but it doesn't work right now and is not packaged.

I am just running Windows for now and waiting for a real native driver to be released. Shame on Intel for not providing one.

Scripted Testing Isn't Just Following Scripts

There is an ongoing (or dead horse, depending on your perspective) about "scripted" vs. "exploratory" testing.

I happen to refer to "scripted testing" as programmatic testing. You use programs, scripts, and tools to augment/enable your testing. You can explore a system with your toolset if you want. That is an example of doing exploratory testing with scripts/programs/tools.

The debate seems to overlook that definition and defines "scripted" as just following a number of predefined steps. I think this is the wrong definition and the wrong argument.. or maybe I just don't get it... or maybe I'm confused by the ambiguous definitions of scripting.

I don't see it as a boolean. I think of it terms of a spectrum and somewhere along that programmatic/manual continuum is where you work. Exploratory testing can fall in many areas of the spectrum and you can do it manually or programmatically.

That is where the argument breaks down (IMHO).

March 30, 2009

Ordered a New Laptop - Dell Studio 17 - Running Ubuntu

I just ordered a new laptop for home. This will be used as my workstation/desktop replacement.

Ubuntu Jaunty Jackalope comes out April 23, and this machine will get a fresh copy. I'm hoping all my hardware works good with it.

Specs:

Dell Studio 17
64 bit
Intel Core 2 Duo T6400 (2.00GHz/800Mhz FSB/2MB cache)
4GB Shared Dual Channel DDR2 at 800MHz
256MB ATI Mobility Radeon HD 3650
250GB SATA Hard Drive (7200RPM)
Glossy widescreen 17.0 inch display (1920x1200)
8X CD/DVD Burner
Intel WiFi Link 5100 802.11agn Half Mini-Card
Back-lit Keyboard

I will post again to tell how the hardware compatibility with Ubuntu and Linux is.

March 25, 2009

Did You Know?

"We are currently preparing students for jobs that don’t yet exist, using technologies that haven’t yet been invented, in order to solve problems we don’t even know are problems yet."

Did You Know 3.0 (video)

... just something to ponder.

March 23, 2009

Pylot Version 1.22 Released - Open Source Web Performance Tool

I just did a release of Pylot, the open source web/http performance tool. You can download it here:
http://www.pylot.org/download.html

New features in Pylot 1.22:

restructured code base
custom timer groups
socket timeout setting
misc bug fixes

Thanks to everyone who contributed to this release!

If you have any problems to report, please post to the discussion forum at: http://clearspace.openqa.org/community/pylot

March 21, 2009

Finally Got Me A Hackergotchi

My Designer friend whipped up my hackergotchi in about 10 mins. Always wanted one of these things:

Zazzle - Business Cards Are Fun To Design

I was bored and played with Zazzle for a little while today. Here is the business card design I came up with:

... not gonna order any, but it was fun to design.

March 18, 2009

Pylot - It Had To Start Somewhere

I just found a scrap of paper on my desk. It is the original version 0.0 of Pylot from May 2007 (notice it all fits on 1 printed page). I'm just posting this to look back on someday.

www.pylot.org

March 6, 2009

Pylot Version 1.21 Released - Open Source Web Performance Tool

I just did a release of Pylot, the open source web/http performance tool. You can download it here:
http://www.pylot.org/download.html

New features in Pylot 1.21:

new HTTP transport layer, using urllib2
new blocking mode (stdout blocked until test finishes, results are returned as XML)
added redirect following
re-implemented cookie handling
compatible with Python 2.6 for Windows
new HTTP debugging mode
new global config file
better message logging
misc bug fixes

Thanks to everyone who contributed to this release! Special thanks to:

Vasil Vangelovski
Adam Smith
Mark Ransom

Most testing was done on Windows XP and Ubuntu Linux 8.10.

If you have any problems to report, please post to the discussion forum at:
http://clearspace.openqa.org/community/pylot

February 24, 2009

Open Source Enterprise Monitoring Systems

I used Nagios for health/performance monitoring of devices/servers for years at a previous job. It has been a while, and I'm starting to look into this space again. There are a lot more options out there for remote monitoring these days.

Here is what I have found that look good:

Do you know of any others I am missing? I'll update this list if I get replies. The requirement is that there must be an Open Source version of the tool.

Amazon - Best in the Cloud?

Dana Blankenhorn (from "Ubuntu allies with Amazon and Dell"):

"the fact is Amazon’s EC2 cloud is currently dominating the space.

It’s open for business, it’s ready for your apps, today. It’s not like Google’s cloud, devoted solely to Google applications, and it’s not like Microsoft’s cloud, devoted to Windows, and it’s not like IBM’s clouds, custom-built like a new global subdivision.

Amazon’s cloud is a service businesses use to host serious applications, many of which make money. Standing at the side of such a cloud vendor is good business."

February 23, 2009

Pylot - Web Load Testing from Amazon Elastic Compute Cloud (EC2)

I have been playing around with Amazon's Elastic Compute Cloud (EC2). It allows you to provision virtual machines on-demand and configure and control your own compute clusters. To do external (over a WAN) performance\load testing against your web application or services, you can put together a cloud based test harnesses using some simple tools. My open source tool Pylot does the job well for simple web performance and load tests. Pylot generates concurrent load (HTTP Requests), verifies server responses, and produces reports with metrics. Tests suites are executed and monitored from a GUI or shell/console. You define your test cases in an XML file. This is where you specify the requests (url, method, body/payload, etc) and verifications.

It is incredibly easy to provision an instance and launch a virtual machine using the EC2 console. Once you have it up and running, you can just connect to your new virtual machine (via RDP or SSH) and get started.

Here is a screenshot of my terminal session using Pylot (click to enlarge):

You can see I took the following steps:

remotely login to the EC2 instance using SSH and my private key
download Pylot using wget
unzip the distribution
change to the Pylot directory
launch the default test with 1 agent (virtual user)

Thats it! I ran a test today doing 1500 Virtual Users from one instance of 64-bit Fedora Linux, and it took me about 5 minutes to get it setup and running. Python 2.5 is already installed on the image, and no further configuration is needed to run a basic test. To run the Pylot GUI and generate results graphs, you need to install wxPython and Matplotlib.

You can upload your own test case files and run any load test scenario you want. Don't forget, you can create and save your own machine images and launch as many as you want to run a large distributed test.

I'd like to build some tools to make this easier and to create preconfigured machine images for other people to start with. It needs common results collection and remote control of user agents. I'm thinking of a cluster of virtual machines all running Pylot, each running thousands of virtual user agents. EC2 provides the ability to place instances in multiple locations, so this could be used to create a massive geo-distributed test bed with capacity on-demand.

More soon...

February 15, 2009

Pylot Version 1.20 Released - Open Source Web Performance Tool

The 1.20 release of Pylot is out!

Go grab a copy at: http://www.pylot.org/download.html

"Pylot is a free open source tool for testing performance and scalability of web services. It runs HTTP load tests, which are useful for capacity planning, benchmarking, analysis, and system tuning. Pylot generates concurrent load (HTTP Requests), verifies server responses, and produces reports with metrics. Tests suites are executed and monitored from a GUI or shell/console."

new features include:

refactored transaction engine with lower memory footprint and disk i/o
automatic cookie handling
better results reports
test naming
specify output location
specify test case file
bug fixes

To get started, visit Getting Started Guide: http://www.pylot.org/gettingstarted.html

Post your questions and feedback in the Pylot forum: http://clearspace.openqa.org/community/pylot

Mark Rogers and I had a nice little hackathon getting this release put together. Special thanks to Mark and the other patch submitters for helping out.

Screenshots:

February 11, 2009

Python - Send Email From Windows Using CDO

Here is quick script showing how to use Python to send email from Windows.

This approach uses the Python For Windows Extensions to access Outlook/Exchange with CDO (Collaboration Data Objects).

#!/usr/bin/env python
# Corey Goldberg

from win32com.client import Dispatch

session = Dispatch('MAPI.session')
session.Logon('','',0,1,0,0,'exchange.foo.com\nUserName');
msg = session.Outbox.Messages.Add('Hello', 'This is a test')
msg.Recipients.Add('Corey', 'SMTP:corey@foo.com')
msg.Send()
session.Logoff()

February 8, 2009

Pylot - Help/Discussion Forum Launched

Thanks to OpenQA, I launched a web forum for help and discussions about Pylot:
Pylot Forums

This forum is a place for Pylot users and developers to share: questions, comments, experiences, enhancement requests, bug reports, etc. If you need any help, this is the place to post

What is Pylot?

"Pylot is a free open source tool for testing performance and scalability of web services. It runs HTTP load tests, which are useful for capacity planning, benchmarking, analysis, and system tuning.

Pylot generates concurrent load (HTTP Requests), verifies server responses, and produces reports with metrics. Tests suites are executed and monitored from a GUI or shell/console."

The official Pylot website is located at: www.pylot.org

January 29, 2009

Pylot Testimonial - Easy Setup for Web Load Testing

(Pylot is a an open source web performance/load testing tool I developed)

I just saw this post from Mark (f4nt) about load/performance testing Confluence (enterprise Wiki software). He briefly talks about performance tool selection and why he chose Pylot as his tool.

Load Testing Confluence

"A small part of my problem of getting my tests rolling was finding a test suite that suited my needs. There’s a ton of potential options such Grinder, Bench, Browsermob, WebInject, HttPerf, funkload, and so on and so forth. I do want to use BrowserMob, just because it’s incredibly slick, looks easy to use, and looks to be quite reliable. Unfortunately, it’s not free though, and my budget for this testing is currently $0 :). My big problem with a few of the load testing applications is that they were a major pain just to setup, and get rolling. JMeter kept crashing, and was more effort to setup than I wanted to deal with. Grinder, again, didn’t seem to have an easy setup method. Granted, I could have made either of these work, and it wouldn’t have killed me. The fact of the matter was though that I just don’t have time to deal with these items. While I am doing these tests for work, I’m mainly doing them off hours because that’s when I have time to actually do it. Hence, I wanted something that could give me basic statistics, that I could setup quickly, that was free. Considering my slant towards open source software and python, the application I would use being written in Python and being GPLed were bonuses.

That brings us to Pylot. It’s free, it’s released under the GPL, and it’s written in Python. As a bonus, the tests are simple to setup, the result output is usable, and making modifications to the application was easy as well. This allows me to quickly create test scenarios and pound away at the application I’m testing with little to no fuss whatsoever. Creating tests is just a matter of modifying a simple XML to place the URLs you wish to hit. You can have post requests as well without any major trouble. Whatever it is you want to do, seems to be quite plausible in the grand scheme of things. Then, when you run into something you can’t do, modifying the code itself to make it do what you want isn’t hard at all either."

Thanks for using and giving props to Pylot, Mark!

January 15, 2009

Python - Read Outlook Email and Save Embedded Attachments

I was struggling with this for days, so I figured I should post the code so others can see how to do it.

The following script will read the last email in an Outlook mailbox and save the attachments. It uses the CDO COM Interface to interact with Outlook/Exchange.

#!/usr/bin/env python
# Read the last email in an Outlook mailbox and save the attachments.

from win32com.client import Dispatch


def main():   
    session = Dispatch('MAPI.session')
    #session.Logon('Outlook')  # for local mailbox
    session.Logon('','',0,1,0,0,'exchange.foo.com\nusername');
    inbox = session.Inbox
    message = inbox.Messages.GetLast()
    attachments = message.Attachments
    for i in range(attachments.Count):
        attachment = attachments.Item(i + 1) # indexes are 1 based
        filename = 'c:\\tempfile_%i' % i
        attachment.WriteToFile(filename)
    session.Logoff()


if __name__ == '__main__':
    main()

* Extra Special thanks to Sergey Golovchenko for helping out.

January 9, 2009

Capacity Planning - A Recession is All About Doing More With Less

I liked this quote a lot:

"A recession is all about doing more with less or, at least, with the capital equipment already purchased. That's a major reason for doing capacity planning."
(from Guerrilla Training Schedule 2009)

I never thought of a recession in those terms. But perhaps this is a boon for Performance Engineers into testing/tuning/scaling and capacity planning. [shrug] Who knows.

January 2, 2009

Weird Python 3.0 / SciTE Editor Issue - Console Output

My editor (SciTE) has always worked fine for programming Python 2.x. I'm now trying some Python 3.0 code and ran into an issue on my first script. The issue doesn't happen running the same code under 2.x. It is really confusing and annoying me and I have no idea who to file a bug with.

For whatever reason, as soon as I call an http.client request() method from within a while loop, nothing further is printed to the editor's console (stdout). If it is not in a loop, I get the output. The script executes fine aside from printing output. If I run the script from a regular command prompt (outside of SciTE), it works fine also.

Setup:
* Windows (tested on XP and Vista)
* Python 3.0 Final
* SciTE Version 1.75-wbd-1

This works: ('foo' is printed once to console)

import http.client

conn = http.client.HTTPConnection('www.goldb.org')
conn.request('GET', '/')
conn.close()
print('foo')

This works: ('foo' is printed repeatedly to console)

import http.client

while True:
    conn = http.client.HTTPConnection('www.goldb.org')
    # conn.request('GET', '/')
    conn.close()
    print('foo')

This doesn't work: (nothing is printed to console)

import http.client

while True:
    conn = http.client.HTTPConnection('www.goldb.org')
    conn.request('GET', '/')
    conn.close()
    print('foo')

This same exact setup works fine in Python 2.x. I've also tried starting Python with the '-u' option to get unbuffered output.

Anyone have ANY clue what could be going on?

Update: This seems to be related to Issue 4705 at bugs.python.org. It has to do with how Python 3.0 does (or doesn't) do unbuffered I/O. A patch was already submitted. Hope it's fixed in next release.

Pages

December 30, 2009

What’s the coolest Python application, framework or library you have discovered in 2009?

What new programming technique did you learn in 2009?

What’s the name of the open source project you contributed the most in 2009? What did you do?

What was the Python blog or website you read the most in 2009?

What are the three top things you want to learn in 2010?

December 17, 2009

December 16, 2009

Matplotlib for Python Developers

Plotting data from a database

Plotting data from the Web

Plotting data by parsing an Apache log file

December 14, 2009

December 9, 2009

December 8, 2009

December 7, 2009

December 6, 2009

December 3, 2009

October 23, 2009

October 19, 2009

October 5, 2009

September 23, 2009

September 11, 2009

September 10, 2009

August 5, 2009

July 20, 2009

July 10, 2009

July 7, 2009

June 25, 2009

June 24, 2009

May 14, 2009

May 11, 2009

May 8, 2009

May 6, 2009

March 30, 2009

March 25, 2009

March 23, 2009

March 21, 2009

March 18, 2009

March 6, 2009

February 24, 2009

February 23, 2009

February 15, 2009

February 11, 2009

February 8, 2009

January 29, 2009

January 15, 2009

January 9, 2009

January 2, 2009