Concurrent Web Scraper in Go
package main
import (
"fmt"
"net/http"
"io/ioutil"
"sync"
)
func fetch(url string, wg *sync.WaitGroup) {
defer wg.Done()
resp, err := http.Get(url)
if err != nil {
fmt.Printf("Error fetching %s: %v\n", url, err)
return
}
defer resp.Body.Close()
body, err := ioutil.ReadAll(resp.Body)
if err != nil {
fmt.Printf("Error reading response body from %s: %v\n", url, err)
return
}
fmt.Printf("Fetched %s: %d bytes\n", url, len(body))
}
func main() {
urls := []string{
"https://www.example.com",
"https://www.google.com",
"https://www.github.com",
}
var wg sync.WaitGroup
for _, url := range urls {
wg.Add(1)
go fetch(url, &wg)
}
wg.Wait()
fmt.Println("All fetches completed.")
}
In this Go script, we are creating a concurrent web scraper that fetches data from multiple websites concurrently.
We start by importing necessary packages: “fmt” for printing, “net/http” for making HTTP requests, “io/ioutil” for reading response bodies, and “sync” for synchronization primitives.
We define a `fetch` function that takes a URL and a pointer to a `sync.WaitGroup`. Inside the function, we make an HTTP GET request to the URL, read the response body, and print the size of the fetched data.
In the `main` function, we define a list of URLs to scrape. We then create a `sync.WaitGroup` to coordinate the concurrent fetching of URLs. We iterate over the URLs, increment the `WaitGroup`, and launch a goroutine to fetch each URL concurrently.
After launching all goroutines, we wait for all of them to finish by calling `wg.Wait()`. Once all fetches are completed, we print a message indicating that all fetches have been completed.
This script demonstrates how to use goroutines and `sync.WaitGroup` in Go to fetch data from multiple websites concurrently.