Go Regular Expressions Introduction
In this article
Regular Expressions are useful for finding patterns within strings of text. For example, you can find all the dates in a blog post with the \d{4}-\d{2}-\d{2}
pattern, which will match strings of the format yyyy-mm-dd
. Also, just last week we made an HTTP router in the Building a Go Router From Scratch tutorial that used regular expressions to extract dynamic URL parameters.
This post dives into Go's built-in regexp
package and show examples of some of the most common uses of regular expressions.
Compiling Expressions#
Go's regexp
package contains many useful utilities for creating and matching regular expressions. Once imported, expressions can be compiled using either the Compile()
or MustCompile()
methods.
import (
"fmt"
"regexp"
)
func main() {
var r *regexp.Regexp
// TIP: Use uninterpereted strings (`...`) instead of
// interpreted strings ("...") to prevent the need to
// escape backslashes and other special characters.
// regexp.MustCompile("(\\s)world")
r = regexp.MustCompile(`(\s)world`)
fmt.Println(r.MatchString("Hello world")) // true
}
Compile() vs. MustCompile()#
The only difference between Compile and MustCompile is how errors are handled. When an invalid pattern is given, Compile returns an error while MustCompile panics instead. When using fixed patterns in code—like we'll be doing in this post—we can safely use MustCompile because we know our patterns are valid. However, when generating dynamic expressions or accepting expressions from user input, errors will happen and should be handled accordingly.
// Create Regexp and error
r, err := regexp.Compile(`my\s*regexp`)
// Create Regexp or panic on fail
r := regexp.MustCompile(`my\s*regexp`)
Basic Matching#
Now that we know how to compile a pattern into the Regexp
type, we can access its methods and functionality. The most common use for regular expressions is to check to see if a pattern matches, and get the value.
All methods have a variant for strings, bytes, and Reader. For example MatchString()
, Match()
, and MatchReader()
.
r := regexp.MustCompile(`Hello \w+!`)
// Check for a match
r.MatchString("Hey, Hello World!") // true
r.MatchString("Goodbye World!") // false
// Also check []byte and Reader
b := []byte("Hey, Hello World!")
r.Match(b) // true
r.MatchReader(bytes.NewBuffer(b)) // true
// Return matched substring
r.FindString("Hey, Hello World!") // "Hello World!"
r.FindString("Goodbye World!") // ""
Group Matching#
Regular expression groups provide a useful means for referencing specific parts within a match. Here are some examples showing how to extract year, month, and day from a string.
// Match "2020-09-30" format
r := regexp.MustCompile(`(?P<Year>\d{4})-(?P<Month>\d{2})-(?P<Day>\d{2})`)
r.FindStringSubmatch("2020-10-04") // ["2020-10-04", "2020", "10", "04"]
r.FindStringSubmatch("yyyy-mm-dd") // nil
r.SubexpNames() // ["", "Year", "Month", "Day"]
Replacement#
We can also replace matches with new values. Replacement also supports groups using the $
character. Groups can either be named ((?P<Year>\d{4}
) or unnamed ((\d{4}
).
// Match "2020-09-30" format
r := regexp.MustCompile(`(?P<Year>\d{4})-(?P<Month>\d{2})-(?P<Day>\d{2})`)
r.ReplaceAllString("2020-10-30", "yyyy-mm-dd") // "yyyy-mm-dd"
r.ReplaceAllString("2020-10-30", "${Year}/${Month}/${Day}") // "2020/10/30"
r.ReplaceAllString("2020-10-30", "$Year/$Month/$Day") // "2020/10/30"
r.ReplaceAllString("2020-10-30", "$1/$2/$3") // "2020/10/30"
r.ReplaceAllStringFunc("2020-10-30, 2040-01-23", func(s string) string {
return strings.ReplaceAll(s, "0", "X")
}) // "2X2X/1X/3X, 2X40-X1-23"
Next Steps#
Regular expressions are extremely powerful tool to have in your programming tool belt. They work very similarly in most languages, so mastering them is time well spent. In this post, we covered some of the most common use-cases, but there are many other features of the pattern syntax that are useful to know.
For example, you can define a case-insensitive expression using the (?i)
flag.
r := regexp.Compile(`(?i)Case InseNsitIve`)
r.MatchString(`THIS IS CASE INSENSITIVE`) // true
For a comprehensive list of regular expression features, visit Google's re2 syntax document at github.com/google/re2/wiki/Syntax.
Happy matching! 😀