Fetch page with proxy using The Go language

Fetch page with proxy using The Go language

For a while i’m playing with The Go Programming Language – so far I loved it. I figure out that I’ll push some code snippets from time to time.
Today I spend some time creating simple not ever crawler, but website fetcher.

Idea is very simple – download page, run xpath query on it and spit out results. I was looking for decent xpath library for Go and couldn’t find any. I tried to use xmlpath but it sucks. I couldn’t even run queries like id('product-details')/div[@class='product-price']" Then I found something nicer – Gokogiri – which works pretty nicely, but – couldn’t find any examples except this small article .

The only problem with running Gokogiri is that it uses libxml2 which is not a huge problem on Linux based systems, but on Mac OS X you have to install it via homebrew
brew install libxml2

Here is code

package main

import (
	"fmt"
	"net/http"
	"log"
	"github.com/moovweb/gokogiri"
	"io/ioutil"
        "os"
)

func main() {
	body := fetch("http://httpbin.org/html")

	doc, err := gokogiri.ParseHtml(body)
	if err != nil {
		log.Fatalln(err)
	}
	defer doc.Free()

	html := doc.Root().FirstChild()
	result, err := html.Search("/html/body/h1")
	if err != nil {
		log.Fatalln(err)

	}
	fmt.Println(result)

}

func fetch(url string) []byte {
        os.Setenv("HTTP_PROXY", "http://x.x.x.x:8080")
	client := &http.Client{}

	req, err := http.NewRequest("GET", url, nil)
	if err != nil {
		log.Fatalln(err)
	}

	req.Header.Set("User-Agent", "Golang Spider Bot v. 3.0")

	resp, err := client.Do(req)
	if err != nil {
		log.Fatalln(err)
	}

	defer resp.Body.Close()
	body, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		log.Fatalln(err)
	}

	return body
}