Saturday, May 30, 2015

Day 13 - Howto use gorp select

The dbmap.Select function in gorp can be used in two different ways. .



    // Method 1: Results are returned as an array 
    // of interfaces ( = rows here )
    rows, err := dbmap.Select(foo, "select * from " + table.TableName)

    // Method 1: read the rows from the returned array of interfaces  
    // rows := []interface{}
    for _, row := range rows {
        // cast the row to our struct
        af := row.(*AliasTransientField)
        fmt.Printf("Method1: ID: %d, BarStr: %s\n", af.GetId(), af.BarStr)
    }


    // Method 2: Resulting rows are appended to a pointer of a slice (foos)
    var foos []AliasTransientField
    _, err = dbmap.Select(&foos, "select * from " + table.TableName)

    // Method 2: read the rows from the input slice (=foos)
    for _, f := range foos {
        fmt.Printf("Method2: ID: %d, BarStr: %s\n", f.Id, f.BarStr)
    }

I needed some time to figure this out (especially the cast) and I hope it can be useful to somebody.

The full code for this test is in: https://github.com/kimxilxyong/intogooglego/tree/master/testgorp


Friday, May 29, 2015

Day 12 - Gorp with Indexes is now ready for beta tests

Gorp with Indexes now passes the test cases for MySQL and PostgreSQL.

Get it from github: http://github.com/kimxilxyong/gorp

The latest (as of 2015.05.29) upstream changes have been merged, the code is now ready for testers.

If you want to use "Gorp with Indexes" just add index tags to your table struct
(idx_user in this example)  and add a import "github.com/kimxilxyong/gorp"

type Post struct {
    Id           uint64    `db:"notnull, PID, primarykey, autoincrement"`
    User         string    `db:"index:idx_user, size:64"`
    PostSub      string    `db:"index:idx_user, size:128"`
    UserIP       string    `db:"notnull, index:idx_user, size:16"`
    BodyType     string    `db:"notnull, size:64"`
    Body         string    `db:"name:PostBody, size:16384"`
    Err          error     `db:"-"` // ignore this field when storing with gorp
}

 Then you only need to call: CreateIndexes

       tablename := "posts_index_test"
    // SetKeys(true) means we have a auto increment primary key, which
    // will get automatically bound to your struct post-insert
    table := dbmap.AddTableWithName(post.Post{}, tablename)
    table.SetKeys(true, "PID")

    // create the table. in a production system you'd generally
    // use a migration tool, or create the tables via scripts
    if err = dbmap.CreateTablesIfNotExists(); err != nil {
        return errors.New("Create tables failed: " + err.Error())
    }

    // Force create all indexes for this database
    if err = dbmap.CreateIndexes(); err != nil {
        return errors.New("Create indexes failed: " + err.Error())
    }
 
  

Thursday, May 21, 2015

Day 11 - Index creation and primary keys in gorp, gorp with indexes

As index creation is currently not available in the gorp master https://github.com/go-gorp/gorp I did as suggested by github, created a fork and added the index creation code myself. I think thats how opensource should work: Create a fork with experimental features and if they are stable and tested it will get merged back into the origin (eventually).

The test is to crawl content from Reddit and store it into a mysql database. The interresting part is in:

App Logic: Crawler
Gorp Struct: The Post struct which gets stored to mysql

If you dont already know how to use git or go get please check this out: http://githowto.com/

The key part is the Post struct, which defines how the Posts table in mysql will be created:

Note: the table creation will NOT work with the master gorp, you have to use: gorp with indexes


// holds a single post
type Post struct {
    Id           uint64    `db:"notnull, PID, primarykey, autoincrement"`
    SecondTestID int       `db:"notnull, name: SID"`
    Created      time.Time `db:"notnull, primarykey"`
    PostDate     time.Time `db:"notnull"`
    Site         string    `db:"name: PostSite, notnull, size:50"`
    PostId       string    `db:"notnull, size:32, unique"`
    Score        int       `db:"notnull"`
    Title        string    `db:"notnull"`
    Url          string    `db:"notnull"`
    User         string    `db:"index:idx_user, size:64"`
    PostSub      string    `db:"index:idx_user, size:128"`
    UserIP       string    `db:"notnull, size:16"`
    BodyType     string    `db:"notnull, size:64"`
    Body         string    `db:"name:PostBody, size:16384"`
    Err          error     `db:"-"` // ignore this field when storing with gorp
}
 

The part of the source which connects your go code to the database is redditFetchGorp:

    // connect to db using standard Go database/sql API
    //db, err := sql.Open("mysql", "user:password@/dbname")
    db, err := sql.Open("mysql", "golang:golang@/golang")
    if err != nil {
        return errors.New("sql.Open failed: " + err.Error())
    }

    // construct a gorp DbMap
    dbmap := &gorp.DbMap{Db: db, Dialect: gorp.MySQLDialect{"InnoDB", "UTF8"}}
    defer dbmap.Db.Close()

    // register the structs you wish to use with gorp
    // you can also use the shorter dbmap.AddTable() if you
    // don't want to override the table name
    _ = dbmap.AddTableWithName(post.Post{}, "posts")

    // create the table. in a production system you'd generally
    // use a migration tool, or create the tables via scripts
    err = dbmap.CreateTablesIfNotExists()
    if err != nil {
        return errors.New("Create table 'posts' failed: " + err.Error())
    } 
 

The resulting table should be:
 
CREATE TABLE `posts` (
  `PID` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `SID` int(11) NOT NULL,
  `Created` datetime NOT NULL,
  `PostDate` datetime NOT NULL,
  `PostSite` varchar(50) NOT NULL,
  `PostId` varchar(32) NOT NULL,
  `Score` int(11) NOT NULL,
  `Title` varchar(255) NOT NULL,
  `Url` varchar(255) NOT NULL,
  `User` varchar(64) DEFAULT NULL,
  `PostSub` varchar(128) DEFAULT NULL,
  `UserIP` varchar(16) NOT NULL,
  `BodyType` varchar(64) NOT NULL,
  `PostBody` varchar(16384) DEFAULT NULL,
  PRIMARY KEY (`PID`,`Created`),
  UNIQUE KEY `PostId` (`PostId`),
  KEY `idx_user` (`User`,`PostSub`)
) ENGINE=InnoDB AUTO_INCREMENT=26 DEFAULT CHARSET=utf8;

Monday, May 18, 2015

Day 10 - ORM comparison in Go using MySql, the rest of the pack

The following "ORM's" i did not actually test, just googled it. Maybe the info is usable for somebody.
StarDate is 2015.05.18

beedb :although beedb has been called dead and unmaintained, it seems to be still alive as a sub of beego: beego orm. But as it is part of a larger framework I have pushed it back in my queue.

qbs: This looks to me like an awesome lib and it is for sure worth further examining: It can create Indexes and does so even on joined/embedded structs. Note to myself: Do a separate post on qbs!


Friday, May 15, 2015

Day 9 - ORM comparison in Go using MySql, today gorm

Today I tested gorm, which looked very promising at the first glance: You can create indexes from field tags which is not possible in gorp currently.


type TestDatabaseTableStruct struct { 
    ID   int
    Name string `sql:"index:idx_name_code"` // Create index with name, and will create combined index if find other fields defined same name
    Code string `sql:"index:idx_name_code"` // `unique_index` also works
}

But then I started to implement my simple "insert a post into one table" example where the pain started:

The interface style of gorm feels very strange, even for me, coming from a C/C++ and Delphi background. This style is all over the place, calling a method on a DB pointer and getting back a DB pointer ?!?

func (s *DB) Where(query interface{}, args ...interface{}) *DB {
    return s.clone().search.Where(query, args...).db
} 

I dont get it, maybe some people smarter than me can understand this.
I assume this strange interface of gorm is because of the "chainability" of it.

But i dont need chainability, because I am able to write SQL!

I struggled for 5 hours to make a simple "insert or update if exists" logic to get to work. I have given up! Gorm does so much annoying magic, like changing table and field names, its just too much hassle to learn another ORM layer when you would be better off learning SQL.

Im not even posting snippets, because I am so annoyed, the incomplete test source for gorm ist at gorm example .

My conclusion is: If you cant or dont want to write SQL you "could" use gorm, for everybody else use gorp.




Thursday, May 14, 2015

Day 8 - ORM comparison in Go using MySql, today gorp

The first ORM im testing is gorp, because its the first that poked my eye, is easy to use and is not part of a larger framework.

As my code is growing im not posting the full source, but only snippets which  handle gorp code.

To test it you need a MySql installation, create a Schema golang for a user golang with password golang: MySql Readme

 Open the database:

    db, err := sql.Open("mysql", "golang:golang@/golang")
    if err != nil {
        return errors.New("sql.Open failed: " + err.Error())
    }

    // Open doesn't open a connection. Validate DSN data:
    err = db.Ping()
    if err != nil {
        return errors.New("db.Ping failed: " + err.Error())
    }

    // construct a gorp DbMap
    dbmap := &gorp.DbMap{Db: db, Dialect: gorp.MySQLDialect{"InnoDB", "UTF8"}}
    defer dbmap.Db.Close()


Create a table if it does not exist:

    // SetKeys(true) means we have a auto increment primary key, which 
    // will get automatically bound to your struct post-insert
    table := dbmap.AddTableWithName(post.Post{}, "posts")
    table.SetKeys(true, "Id")
 

Check if the post already is in the database:

     // check if post already exists
      count, err := dbmap.SelectInt("select count(*) from posts where PostId = ?", post.PostId)
      if err != nil {
          return errors.New("select count(*) from posts failed: " + err.Error())
      }

Do an insert if count == 0
                 
         err = dbmap.Insert(&post) 
         if err != nil {
             return errors.New("insert into table posts failed: " + err.Error())
         }

.
Edit 2015.05.21:
The code for fetching and the post structure have been changed to use gorp with indexes, please look at Day 11 of my blog for details how to create indexes with gorp.


Friday, May 8, 2015

Day 7 - GoQuery HTML parsing

This post is about testing GoQuery from PuerkitoBio (tnx for this amazing lib!), a library for parsing HTML, which I intend to use in my webcrawler.
Its syntax is similar to jQuery ( Readme ), so i thought thats cool, lots of people use jQuery and its well documented. Furthermore its much less boilerplate code than the plain net/html package which comes with Go.

I had some issues to wrap my head around to understand that ".title.may-blank" is different 
from ".title .may-blank"  (note the space). The first searches for an element with 
class="title may-blank", and the second one searches for a class="title" with a child class="may-blank".

After scratching my head for several hours i finally decided to look into the jQuery documentation - aaaand Bingo!! (To note is that I hate JavaScript since the 90s and try to avoid it whenever possible) 

Here is my test code, parsing posts (from a static html) from reddit:
The test HTML was copied from within the firefox inspector and then formatted by the Tidy2 plugin for Notebook++

package main

import (
    "fmt"
    "github.com/PuerkitoBio/goquery"
    "io"
    "log"
    "strings"
    "unicode"
)

func ExampleGoQuery() {

    // Get an io.Reader with HTML content
    io := getHtmlInputReader()

    // Create a qoquery document to parse from
    doc, err := goquery.NewDocumentFromReader(io)
    checkErr(err, "Failed to parse HTML")

    fmt.Println("---- Starting to parse ------------------------")

    // Find reddit posts = elements with class "thing"
    thing := doc.Find(".thing")
    for iThing := range thing.Nodes {

        // use `single` as a selection of 1 node
        singlething := thing.Eq(iThing)

        // get the reddit post identifier
        reddit_post_id, exists := singlething.Attr("data-fullname")
        if exists == true {

            // find an element with class title and a child with class may-blank
            reddit_post_title := singlething.Find(".title .may-blank").Text()
            reddit_post_user := singlething.Find(".author").Text()
            // find an element with class comments and may-blank (in the same element, note the space!)
            reddit_post_url, _ := singlething.Find(".comments.may-blank").Attr("href")
            reddit_post_score := singlething.Find(".score.likes").Text()
            reddit_postdate, exists := singlething.Find("time").Attr("datetime")

            if exists == true {

                // Remove CRLF and unnecessary whitespaces
                reddit_post_title = stringMinifier(reddit_post_title)

                // Print out the crawled info
                fmt.Println("Id = " + reddit_post_id)
                fmt.Println("Date = " + reddit_postdate)
                fmt.Println("User = " + reddit_post_user)
                fmt.Println("Title = " + reddit_post_title)
                fmt.Println("Score = " + reddit_post_score)
                fmt.Println("Url = " + reddit_post_url)
                fmt.Println("-----------------------------------------------")

            }
        }

    }
}

// Removes all unnecessary whitespaces
func stringMinifier(in string) (out string) {

    white := false
    for _, c := range in {
        if unicode.IsSpace(c) {
            if !white {
                out = out + " "
            }
            white = true
        } else {
            out = out + string(c)
            white = false
        }
    }
    return
}

// returns an io.Reader with dummy test html
func getHtmlInputReader() io.Reader {
    s := `
<html>
  <head>
    <meta name="generator"
    content="HTML Tidy for HTML5 (experimental) for Windows https://github.com/w3c/tidy-html5/tree/c63cc39" />
    <title></title>
  </head>
  <body>
    <div class="thing id-t3_34z9xo odd link" onclick="click_thing(this)" data-fullname="t3_34z9xo">
      <p class="parent"></p>
      <span class="rank">1</span>
      <div class="midcol unvoted">
        <div class="arrow up login-required" onclick="$(this).vote(r.config.vote_hash, null, event)" role="button"
        aria-label="upvote" tabindex="0"></div>
        <div class="score dislikes">10</div>
        <div class="score unvoted">11</div>
        <div class="score likes">12</div>
        <div class="arrow down login-required" onclick="$(this).vote(r.config.vote_hash, null, event)" role="button"
        aria-label="downvote" tabindex="0"></div>
      </div>
      <div class="entry unvoted">
        <p class="title">
        <a class="title may-blank loggedin" href="https://github.com/dariubs/GoBooks" tabindex="1">dariubs/GoBooks: list of paper
        and electronic books on Go</a> 
        <span class="domain">(
        <a href="/domain/github.com/">github.com</a>)</span></p>
        <p class="tagline">submitted 
        <time title="Tue May 5 20:13:08 2015 UTC" datetime="2015-05-05T20:13:08+00:00" class="live-timestamp">10 hours ago</time>
        <a href="http://www.reddit.com/user/dgryski" class="author may-blank id-t2_3hcmx">dgryski</a></p>
        <ul class="flat-list buttons">
          <li class="first">
            <a href="http://www.reddit.com/r/golang/comments/34z9xo/dariubsgobooks_list_of_paper_and_electronic_books/"
            class="comments may-blank">1 comment</a>
          </li>
          <li class="share">
            <span class="share-button toggle" style="">
              <a class="option active login-required" href="#" tabindex="100"
              onclick="return toggle(this, share, cancelShare)">share</a>
              <a class="option" href="#">cancel</a>
            </span>
          </li>
          <li class="link-save-button save-button">
            <a href="#">save</a>
          </li>
          <li>
            <form action="/post/hide" method="post" class="state-button hide-button">
              <input name="executed" value="hidden" type="hidden" />
              <span>
                <a href="javascript:void(0)" onclick="change_state(this, &#39;hide&#39;, hide_thing);">hide</a>
              </span>
            </form>
          </li>
          <li class="report-button">
            <a href="javascript:void(0)" class="action-thing" data-action-form="#report-action-form">report</a>
          </li>
        </ul>
        <div class="expando" style="display: none">
          <span class="error">loading...</span>
        </div>
      </div>
      <div class="child"></div>
      <div class="clearleft"></div>
    </div>
   <div class="thing id-t3_359t2l odd link self" onclick="click_thing(this)" data-fullname="t3_359t2l">
      <p class="parent"></p>
      <span class="rank">1</span>
      <div class="midcol unvoted">
        <div class="arrow up login-required" onclick="$(this).vote(r.config.vote_hash, null, event)" role="button"
        aria-label="upvote" tabindex="0"></div>
        <div class="score likes">•</div>
        <div class="score unvoted">•</div>
        <div class="score dislikes">•</div>
        <div class="arrow down login-required" onclick="$(this).vote(r.config.vote_hash, null, event)" role="button"
        aria-label="downvote" tabindex="0"></div>
      </div>
      <div class="entry unvoted">
        <p class="title">
        <a class="title may-blank loggedin" href="/r/golang/comments/359t2l/bulding_api_services_in_go/" tabindex="1"
        rel="nofollow">Bulding API services in Go</a> 
        <span class="domain">(
        <a href="/r/golang/">self.golang</a>)</span></p>
        <div class="expando-button collapsed selftext" onclick="expando_child(this)"></div>
        <p class="tagline">submitted 
        <time title="Fri May 8 08:41:00 2015 UTC" datetime="2015-05-08T08:41:00+00:00" class="live-timestamp">a minute ago</time>
        by 
        <a href="http://www.reddit.com/user/jan1024188" class="author may-blank id-t2_8bbqm">jan1024188</a></p>
        <ul class="flat-list buttons">
          <li class="first">
            <a href="http://www.reddit.com/r/golang/comments/359t2l/bulding_api_services_in_go/"
            class="comments empty may-blank">comment</a>
          </li>
          <li class="share">
            <span class="share-button toggle" style="">
              <a class="option active login-required" href="#" tabindex="100"
              onclick="return toggle(this, share, cancelShare)">share</a>
              <a class="option" href="#">cancel</a>
            </span>
          </li>
          <li class="link-save-button save-button">
            <a href="#">save</a>
          </li>
          <li>
            <form action="/post/hide" method="post" class="state-button hide-button">
              <input name="executed" value="hidden" type="hidden" />
              <span>
                <a href="javascript:void(0)" onclick="change_state(this, &#39;hide&#39;, hide_thing);">hide</a>
              </span>
            </form>
          </li>
          <li class="report-button">
            <a href="javascript:void(0)" class="action-thing" data-action-form="#report-action-form">report</a>
          </li>
        </ul>
        <div class="expando" style="display: none">
          <span class="error">loading...</span>
        </div>
      </div>
      <div class="child"></div>
      <div class="clearleft"></div>
    </div>    
  </body>
</html>
`
    return strings.NewReader(s)
}

func checkErr(err error, msg string) {
    if err != nil {
        log.Fatalln(msg, err)
    }
}

func main() {
    ExampleGoQuery()
}


Every day im falling more into love with Go!