I use urllib2 from Python’s standard library, in quite a few projects. It’s quite nice, but the documentation isn’t very comprehensive and it always makes me feel like I’m programming Java once I want to do something more complicated than just open an URL and read the response (i.e. handling redirect responses, reading response headers, etc).
Anyway, the other day I found - if not a bug - then at least an undocumented issue. Since Python 2.6, urllib2 provides a way to set the timeout time, like in the following code where the timeout is set to 2.5 seconds:
import urllib2
try:
response = urllib2.urlopen("http://google.com", None, 2.5)
except URLError, e:
print "Oops, timed out?"
If no timeout is specified, the global socket timeout value will be used, which by default is infinite.
The above code will catch almost every timeout, but the problem is that you might still get a timeout raised as a totally different exception:
File "/usr/lib/python2.4/socket.py", line 285, in read
data = self._sock.recv(recv_size)
File "/usr/lib/python2.4/httplib.py", line 460, in read
return self._read_chunked(amt)
File "/usr/lib/python2.4/httplib.py", line 495, in _read_chunked
line = self.fp.readline()
File "/usr/lib/python2.4/socket.py", line 325, in readline
data = recv(1)
socket.timeout: timed out
The solution is to catch this other exception, thrown by python’s socket lib, as well:
import urllib2
import socket
try:
response = urllib2.urlopen("http://google.com", None, 2.5)
except URLError, e:
print "Oops, timed out?"
except socket.timeout:
print "Timed out!"
Hopefully this will save someone else some headache :).
Commments