Note: I've tried to make sure this website can be used with JS disabled, but some of the articles use MathJax to display math formulas. Those formulas will sadly be displayed as LaTeX code if your browser has JavaScript disabled.

Simple Blog Stats

By Jon Valdés - (2022-07-29)

I've been wanting to have some view statistics for my blog for a long time, mainly to answer the question: "Am I writing for 5, 50, 500 people?".

However, I've never liked the idea of relying on external services, especially if they used cookies to track visitors.

So a few days ago I decided to go into full "I'll do it myself" mode and implement a small Rust utility to parse nginx access log files and generate some graphs out of them. After a few days tweaking the tool and seeing what stats I could get out of it, I gotta say I'm quite happy with this tool :)

It generates images like this:

Visits over a 2-week period

In that image we can see a few things.

First, there's a big spike from when I posted the pen holder article. I only posted it on Twitter and didn't get much movement there, so I guess that's why the spike has such a sharp decline.

And second, there's a steady trickle of people somehow finding my blog and reading some stuff in it. I'm assuming the circular harmonics article is the main driver behind that, given the views it gets. It's nice knowing people liked that one!

Anyway, with this tool I'm happy that:

I can now see how many people read my blog, and which specific articles are read
I don't have to intrude on anyone's privacy to get that info
I don't have to rely on external services that might go down tomorrow
Some people do read my blog, so it's not 100% wasted effort :)

How to build a tool like this

The code itself is very hardcoded to my blog so I don't think it'd be very useful for other people, but I can share the general structure so anyone can build a similar tool for themselves. Mine is ~300 lines long including the graph drawing, so it shouldn't take long to build a clone :-)

Rust crates used:

anyhow: Easy error management
chrono: Datetime handling
flate2: Decompressing log files
plotters: Drawing pretty graphs
rusqlite: Generating an SQLite DB on the fly with all the log info
uaparser: Parsing of user agents (used to try to discard bots)
walkdir: Iterating over all the log files

The general flow is:

Download log files from server (this part is done with a bash script)
For each log file:
- Decompress
- Split into lines
- Parse each line
- Put parsed log even info into in-memory SQLite DB
Extract data from DB and generate charts

Log file parsing

One slightly more involved thing I had to do was parse the nginx logs. They're full of malicious requests, so they're a bit of a sensitive thing, but this snippet of code seems to be robust enough to handle the non-malicious requests and more or less cope with the bad ones. It's not particularly optimized (it does string copying everywhere), but it's fast enough that the whole script handles 160K log lines in <5 seconds in my laptop, so... good enough for me!

Also, I make no claims that this is a perfect, or even decent parser. It's just something I whipped up in 45 minutes and seems to work well enough for the logs I have.

// Log format:
// 66.249.76.149 - - [28/Jul/2022:08:37:26 +0000] "GET / HTTP/1.1" 301 169 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

fn sanitize_request(req: &str) -> String {
    let splits = req.split(' ');
    let url_split = splits.skip(1).next(); // Skip the http method (GET, POST...) and ignore the HTTP version at the end

    if url_split.is_some() {
        String::from(url_split.unwrap())
    } else {
        result
    }
}

fn tokenize_log_line(line: &str) -> Vec<String> {
    let mut result = vec![];
    let mut cs = line.chars();

    let mut current_token = String::new();
    let mut token_delimiter = None;
    loop {
        let ch = cs.next();
        if ch.is_none() {
            break;
        }
        let c = ch.unwrap();

        if current_token.is_empty() && token_delimiter.is_none() {
            // Read starting char
            match c {
                '"' => token_delimiter = Some('"'),
                '[' => token_delimiter = Some(']'),
                ' ' => {}  // Ignore spaces
                '\t' => {} // Ignore tabs
                _ => {
                    token_delimiter = Some(' ');
                    current_token.push(c);
                }
            }
        } else {
            if Some(c) == token_delimiter {
                // Finish the token
                result.push(current_token);
                current_token = String::new();
                token_delimiter = None;
            } else {
                current_token.push(c);
            }
        }
    }

    // TODO -- If current_token isn't empty here, that should be an error

    result
}

struct LogEvent {
    // ip: String,
    timestamp: chrono::DateTime<FixedOffset>,
    request: String,
    status: u16,
    // bytes: u32,
    referrer: String,
    user_agent: String,
}

impl LogEvent {
    fn from_str(line: &str) -> Result<Self> {
        let tokens = tokenize_log_line(line);
        macro_rules! try_token {
            ($index: expr, $name: expr) => { {
                if tokens.len() <= $index {
                    return Err(anyhow::anyhow!(
                        "line doesn't have enough tokens for {}: {}",
                        $name,
                        line
                    ));
                }
                tokens[$index].clone()
            } };
        }

        Ok(LogEvent {
            //ip: try_token!(0usize, "ip"),
            timestamp: chrono::DateTime::parse_from_str(
                &try_token!(3usize, "timestamp"),
                "%d/%h/%Y:%H:%M:%S %z",
            )?,
            request: sanitize_request(&try_token!(4, "request")),
            status: u16::from_str_radix(&try_token!(5, "status"), 10)?,
            // bytes: u32::from_str_radix(&try_token!(6, "bytes"), 10)?,
            referrer: try_token!(7, "referrer"),
            user_agent: try_token!(8, "request"),
        })
    }
}

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License .