refreshSG
Christopher Rath
2003-07-05
For my own sanity, as well as the protection of our children, our home Internet connection filters web queries. There are two filtering methods that can be used: (1) scanning the content of the pages and blocking content the filter identifies as "should be blocked"; and (2) keeping a list of sites to be blocked (called a Blacklist) and sites to be allowed (called a Whitelist), and blocking/allowing them. We have opted to use the second method, and the application we use is called squidGuard.
SquidGuard works in conjunction with a web caching application called Squid. Our home Internet connection is a cable modem, and use of a web cache reduces the number of queries the computer must make; hopefully speeding our web access.
The fundamental flaw associated with Blacklists is that because the web is a constantly changing environment the Blacklists must be constantly updated. There are a number of interested parties on the Internet who maintain Blacklists (out of personal interest), and so our approach has been to choose to use one of the Blacklist consolidator engines. To facilitate the regular updating of our Blacklist, I wrote a small script that can be regularly executed; which downloads the list and installs it.
This webpage describes two things:
SquidGuard is a web-filtering package that is used to block and allow web access. For anyone who wants to make full use of squidGuard Blacklists, squidGuard must be configured to block and allow the specific classes of sites you desire to filter. In squidGuard-speak, this means configuring the "allow" and "deny" lists.
The squidGuard website contains full details about how squidGuard Blacklists are configured; however, here’s my squidGuard.conf file which shows multiple lists being configured---I have configured blocking of drug, gambling, and pornographic websites plus allowed for local lists of sites to be blocked or allowed:
logdir /var/log/squid
dbhome /etc/squidGuard/db
dest allow_local {
domainlist local/allow_domains
urllist local/allow_urls
}
dest deny_local {
domainlist local/deny_domains
urllist local/deny_urls
}
dest drugs {
domainlist drugs/domains
urllist drugs/urls
}
dest gambling {
domainlist gambling/domains
urllist gambling/urls
}
dest porn {
domainlist porn/domains
urllist porn/urls
}
acl {
default {
pass allow_local !deny_local !gambling !porn !drugs all
redirect http://192.168.1.1/cgi-bin/squidGuard.cgi?clientaddr=%a&clientname=%n&clientuser=%i&clientgroup=%s&targetgroup=%t&url=%u
}
}
It is important to note that each of the files referred to in the squidGuard.conf file must exist. If you attempt to use my .conf file you must manually create the local directory and the files my .conf file assumes exist there. The following command may be used to create the directory and empty files (run as root):
cd /etc/squidGuard/db
mkdir local
touch local/allow_domains local/allow_urls local/deny_domains local/deny_urls
chown -R squid:suvlet local
What this squidGuard.conf file does it to allow domains/urls matched by the ‘allow_local’ rule to be accessed, but denies access to ‘deny_local’, ‘drugs’, ‘gambling’, and ‘porn’. If the domain/url has not matched anything after those rules have been applied, then it is allowed to be accessed (this is what the final ‘all’ means).
The databases in the local directory (described by the ‘allow_local’ and ‘deny_local’ rules) are files maintained by the local system administrator. These local files are not updated or maintained by the squidGuard robot. Consequently, these local rules will not be overwritten when a new Blacklist is downloaded. Thus, as you find specific sites you wish to specifically allow or deny at your site you edit the appropriate file, rebuild the squidGuard databases (only if you pre-build them), and restart squidGuard.
Once you have edited your squidGuard.conf file, you may need to rebuild the Blacklist databases and restart Squid. This can be done by running the following commands as root (if things don’t appear to be working properly then check the squidGuard log in the /var/log/squid directory); the first of the two commands is skipped if you don't pre-build your squidGuard database files:
/usr/sbin/squidGuard -C all
/etc/rc.d/init.d/squid restart
If things still aren't working, check the ownership of the files in the squidGuard Blacklist database. All the files in /etc/squidGuard/db must be owned by userid ‘squid’ and groupid ‘suvlet’. The following command can be used to reset the file ownership (run it as root, and remember to restart Squid afterwards):
chown -R squid:suvlet /etc/squidGuard/db/*
ClarkConnect version 1.x does not contain any method for automatically downloading new squidGuard Blacklists. Also, the squidGuard website itself does not offer a script to perform this task, and so it was necessary for me to write a script.
I started my task by posting a query to one of the ClarkConnect user discussion forums, asking if anyone else had already written such a script. A sysadmin named Mike anonymously posted his script. So I took that and rewrote it into something a little more robust. I also created a small PHP webconfig module and put it all together into an RPM to simplify installation.
The RPM can be downloaded from refreshSG-1.3-2.i386.rpm.
A tar.gz file suitable for manually installing on any Linux system can be downloaded from refreshSG-1.3-2.tar.gz.
Note that my refreshSG script is dependent upon wget, and wget must be installed before my refreshSG RPM is installed.
Here are some basic installation instructions:
ftp ftp.redhat.com
cd /pub/redhat/linux/7.2/en/os/i386/RedHat/RPMS/
bin
get wget-1.7-3.i386.rpm
rpm -Uvh wget-1.7-3.i386.rpm
wget http://www.rath.ca/Misc/3Com-3C19504/refreshSG-1.1-1.i386.rpm
rpm -Uvh refreshSG-1.1-1.i386.rpm
The refresh script will not run until you explicitly turn it on via its configuration page. The configuration page also allows you to change the URL the refreshSG script pulls the Blacklist from. When the script does run, it will write a log of its session to /var/log/refreshSG, and that log file is displayed in the lower half of the configuration page.
The refreshSG log file will grow over time and Linux manages this using a program called logrotate that is run each day by cron. A logrotate configuration file to manage the refreshSG log files is installed by the refreshSG rpm.
v1.0 [2002-07-16] — First public release (via the ClarkConnect General Forum). Released into the public domain.
v1.1 [2002-07-22] — Second public release. Have added extra help to the ClarkConnect webconfig page, added a man page, and exposed two additional variables to the refreshSG.conf file.
v1.2 [2002-07-23] — Third public release. Fixed another typo in the webconfig screen and changed the way the /etc/crontab entry is made.
v1.3 [2003-07-05] — Fourth public release. New in this release: log file rotation, ability to not pre-build the squidGuard database files, updated look & feel of PHP screens to match CC v1.3, corrected some spelling mistakes, updated the help text: adding new information and clarifying, the existing information, updated the default blacklist URL to match the new location of the official squidGuard blacklist, updated the ownership of some of refreshSG's files to match the new file ownerships being used in CC v1.3, made the RPM dependent upon the cc-webconfig-1.3 package, removes any refreshSG /etc/crontab entry on an uninstall of the RPM package.