2014-02-23 21:06:16

Gmane mailing list statistics with graph

Gmane is a large mailing list archive. There you can get statistics for most common lists. You get the data in the form of "date #mails #spam". Example given: http://gmane.org/output-rate.php?group=gmane.psychology.depression:

date posting-rate spam-rate
20081113 1 0

If you want the data on a monthly base you can use this script. It sums all mails for a month and print it out. Spam is ignored but you can easily add it. Same example as above:

$ mail_statistics.sh gmane.psychology.depression
200811 1

I use this data to generate plots with gnuplot from it. To get the x-axis right you need to set the timefmt to "%Y%m" for understanding values like "201402". The complete gnuplot config I use for this purpose:

set autoscale
set grid
set xdata time
set timefmt "%Y%m"
set xtics format "%Y"
set xrange ["200201":"201404"] noreverse
set yrange [0:4100]
set output "images/postfix.png"
set terminal png size 900,600
set xlabel "Jahr"
set ylabel "Mails"
set title "postfix-users@postfix.org"
plot "gnuplot/postfix.dat" using 1:2 title 'mails' with boxes

As you can see in the gnuplot file I use my script to generate statistics for "postfix-users@postfix.org", the postfix users mailing list. To generate monthly statistics with my script you need to provide the gmane group name. It always starts with "gmane" and consists out of tags divided by dots. Examples: gmane.linux.arch.general, gmane.psychology.depression, gmane.mail.postfix.user, gmane.mail.exim.user.

Source of mail_statistics.sh:

#!/usr/bin/env bash

gmane=$1
TEMP=$(mktemp)
wget -q "http://gmane.org/output-rate.php?group=${gmane}" -O "${TEMP}"
grep -q "date posting-rate spam-rate" "${TEMP}" || { echo wrong parameter; rm ${TEMP}; exit; }
counter_messages=0
#counter_spam=0
previous=""
(grep -v date "${TEMP}"; echo end 0 0) | while read day; 
do
        read date messages spam <<< "${day}"
        timestamp=${date:0:6}
        if [ "${timestamp}" != "${previous}" ] && [ -n "${previous}" ];
        then
                echo "${previous:0:4}${previous: -2} ${counter_messages}"
                counter_messages=0
#               counter_spam=0
        fi
        counter_messages=$((${counter_messages}+${messages}))
#       counter_spam=$((${counter_spam}+${spam}))
        previous=${timestamp}
done
rm "${TEMP}"

To generate postfix-users mailing list statistics the commando

$ mail_statistics.sh gmane.mail.postfix.user > gnuplot/postfix.dat; gnuplot gnuplot/postfix.plt

gives you the following postfix.png in images/


Posted by toerb | Permanent link | File under: bash