Go Programming

Concurrent Web Scraper in Go


package main

import (
"fmt"
"net/http"
"io/ioutil"
"sync"
)

func fetch(url string, wg *sync.WaitGroup) {
defer wg.Done()

resp, err := http.Get(url)
if err != nil {
fmt.Printf("Error fetching %s: %v\n", url, err)
return
}

defer resp.Body.Close()

body, err := ioutil.ReadAll(resp.Body)
if err != nil {
fmt.Printf("Error reading response body from %s: %v\n", url, err)
return
}

fmt.Printf("Fetched %s: %d bytes\n", url, len(body))
}

func main() {
urls := []string{
"https://www.example.com",
"https://www.google.com",
"https://www.github.com",
}

var wg sync.WaitGroup

for _, url := range urls {
wg.Add(1)
go fetch(url, &wg)
}

wg.Wait()
fmt.Println("All fetches completed.")
}

In this Go script, we are creating a concurrent web scraper that fetches data from multiple websites concurrently.

We start by importing necessary packages: “fmt” for printing, “net/http” for making HTTP requests, “io/ioutil” for reading response bodies, and “sync” for synchronization primitives.

We define a `fetch` function that takes a URL and a pointer to a `sync.WaitGroup`. Inside the function, we make an HTTP GET request to the URL, read the response body, and print the size of the fetched data.

In the `main` function, we define a list of URLs to scrape. We then create a `sync.WaitGroup` to coordinate the concurrent fetching of URLs. We iterate over the URLs, increment the `WaitGroup`, and launch a goroutine to fetch each URL concurrently.

After launching all goroutines, we wait for all of them to finish by calling `wg.Wait()`. Once all fetches are completed, we print a message indicating that all fetches have been completed.

This script demonstrates how to use goroutines and `sync.WaitGroup` in Go to fetch data from multiple websites concurrently.