I use urllib2 from Python’s standard library, in quite a few projects. It’s quite nice, but the documentation isn’t very comprehensive and it always makes me feel like I’m programming Java once I want to do something more complicated than just open an URL and read the response (i.e. handling redirect responses, reading response headers, etc).

Anyway, the other day I found - if not a bug - then at least an undocumented issue. Since Python 2.6, urllib2 provides a way to set the timeout time, like in the following code where the timeout is set to 2.5 seconds:

import urllib2

try:
    response = urllib2.urlopen("http://google.com", None, 2.5)
except URLError, e:
    print "Oops, timed out?"

If no timeout is specified, the global socket timeout value will be used, which by default is infinite.

The above code will catch almost every timeout, but the problem is that you might still get a timeout raised as a totally different exception:

File "/usr/lib/python2.4/socket.py", line 285, in read
  data = self._sock.recv(recv_size)
File "/usr/lib/python2.4/httplib.py", line 460, in read
  return self._read_chunked(amt)
File "/usr/lib/python2.4/httplib.py", line 495, in _read_chunked
  line = self.fp.readline()
File "/usr/lib/python2.4/socket.py", line 325, in readline
  data = recv(1)
socket.timeout: timed out

The solution is to catch this other exception, thrown by python’s socket lib, as well:

import urllib2
import socket

try:
    response = urllib2.urlopen("http://google.com", None, 2.5)
except URLError, e:
    print "Oops, timed out?"
except socket.timeout:
    print "Timed out!"

Hopefully this will save someone else some headache :).