I use urllib2 from Python’s standard library, in quite a few projects. It’s quite nice, but the documentation isn’t very comprehensive and it always makes me feel like I’m programming Java once I want to do something more complicated than just open an URL and read the response (i.e. handling redirect responses, reading response headers, etc).
Anyway, the other day I found - if not a bug - then at least an undocumented issue. Since Python 2.6, urllib2 provides a way to set the timeout time, like in the following code where the timeout is set to 2.5 seconds:
import urllib2 try: response = urllib2.urlopen("http://google.com", None, 2.5) except URLError, e: print "Oops, timed out?"
If no timeout is specified, the global socket timeout value will be used, which by default is infinite.
The above code will catch almost every timeout, but the problem is that you might still get a timeout raised as a totally different exception:
File "/usr/lib/python2.4/socket.py", line 285, in read data = self._sock.recv(recv_size) File "/usr/lib/python2.4/httplib.py", line 460, in read return self._read_chunked(amt) File "/usr/lib/python2.4/httplib.py", line 495, in _read_chunked line = self.fp.readline() File "/usr/lib/python2.4/socket.py", line 325, in readline data = recv(1) socket.timeout: timed out
The solution is to catch this other exception, thrown by python’s socket lib, as well:
import urllib2 import socket try: response = urllib2.urlopen("http://google.com", None, 2.5) except URLError, e: print "Oops, timed out?" except socket.timeout: print "Timed out!"
Hopefully this will save someone else some headache :).