Instead of the heavyweights we are going to focus on Groovy's XmlSlurper class. The first task is to parse the file. I'm trying to parse a non-well-formatted HTML page with XmlSlurper, the Eclipse download site The W3C validator shows several errors in the page. I tried the fault.

