Christopher Rath


Web Filters & Currency

For my own sanity, as well as the protection of our children, our home Internet connection filters web queries.  There are two filtering methods that can be used: (1) scanning the content of the pages and blocking content the filter identifies as "should be blocked"; and (2) keeping a list of sites to be blocked (called a Blacklist) and sites to be allowed (called a Whitelist), and blocking/allowing them.  We have opted to use the second method, and the application we use is called squidGuard.

SquidGuard works in conjunction with a web caching application called Squid.   Our home Internet connection is a cable modem, and use of a web cache reduces the number of queries the computer must make; hopefully speeding our web access.

The fundamental flaw associated with Blacklists is that because the web is a constantly changing environment the Blacklists must be constantly updated.  There are a number of interested parties on the Internet who maintain Blacklists (out of personal interest), and so our approach has been to choose to use one of the Blacklist consolidator engines.  To facilitate the regular updating of our Blacklist, I wrote a small script that can be regularly executed; which downloads the list and installs it.

This webpage describes two things:

  1. My approach to the configuration of Blacklists; an approach that allows both the maintenance of local black and white lists, as well as the regular import of externally maintained lists;
  2. The installation and use of refreshSG, my script for updating Blacklists.

squidGuard Blacklists

SquidGuard is a web-filtering package that is used to block and allow web access.  For anyone who wants to make full use of squidGuard Blacklists, squidGuard must be configured to block and allow the specific classes of sites you desire to filter.  In squidGuard-speak, this means configuring the "allow" and "deny" lists.

Configuring Blacklists

The squidGuard website contains full details about how squidGuard Blacklists are configured; however, here’s my squidGuard.conf file which shows multiple lists being configured---I have configured blocking of drug, gambling, and pornographic websites plus allowed for local lists of sites to be blocked or allowed:

logdir /var/log/squid
dbhome /etc/squidGuard/db

dest allow_local {
  domainlist local/allow_domains
  urllist    local/allow_urls

dest deny_local {
  domainlist local/deny_domains
  urllist    local/deny_urls

dest drugs {
 domainlist drugs/domains
 urllist    drugs/urls

dest gambling {
 domainlist gambling/domains
 urllist    gambling/urls

dest porn {
 domainlist porn/domains
 urllist    porn/urls

acl {
  default {
    pass allow_local !deny_local !gambling !porn !drugs all

It is important to note that each of the files referred to in the squidGuard.conf file must exist. If you attempt to use my .conf file you must manually create the local directory and the files my .conf file assumes exist there. The following command may be used to create the directory and empty files (run as root):

cd /etc/squidGuard/db
mkdir local
touch local/allow_domains local/allow_urls local/deny_domains local/deny_urls
chown -R squid:suvlet local

What this squidGuard.conf file does it to allow domains/urls matched by the ‘allow_local’ rule to be accessed, but denies access to ‘deny_local’, ‘drugs’, ‘gambling’, and ‘porn’. If the domain/url has not matched anything after those rules have been applied, then it is allowed to be accessed (this is what the final ‘all’ means).

The databases in the local directory (described by the ‘allow_local’ and ‘deny_local’ rules) are files maintained by the local system administrator. These local files are not updated or maintained by the squidGuard robot. Consequently, these local rules will not be overwritten when a new Blacklist is downloaded. Thus, as you find specific sites you wish to specifically allow or deny at your site you edit the appropriate file, rebuild the squidGuard databases (only if you pre-build them), and restart squidGuard.

Once you have edited your squidGuard.conf file, you may need to rebuild the Blacklist databases and restart Squid. This can be done by running the following commands as root (if things don’t appear to be working properly then check the squidGuard log in the /var/log/squid directory); the first of the two commands is skipped if you don't pre-build your squidGuard database files:

/usr/sbin/squidGuard -C all
/etc/rc.d/init.d/squid restart

If things still aren't working, check the ownership of the files in the squidGuard Blacklist database.  All the files in /etc/squidGuard/db must be owned by userid ‘squid’ and groupid ‘suvlet’.  The following command can be used to reset the file ownership (run it as root, and remember to restart Squid afterwards):

chown -R squid:suvlet /etc/squidGuard/db/*


Refreshing Blacklists

ClarkConnect version 1.x does not contain any method for automatically downloading new squidGuard Blacklists. Also, the squidGuard website itself does not offer a script to perform this task, and so it was necessary for me to write a script.

I started my task by posting a query to one of the ClarkConnect user discussion forums, asking if anyone else had already written such a script. A sysadmin named Mike anonymously posted his script. So I took that and rewrote it into something a little more robust. I also created a small PHP webconfig module and put it all together into an RPM to simplify installation.

Download refreshSG

The RPM can be downloaded from refreshSG-1.3-2.i386.rpm.

A tar.gz file suitable for manually installing on any Linux system can be downloaded from refreshSG-1.3-2.tar.gz.

Note that my refreshSG script is dependent upon wget, and wget must be installed before my refreshSG RPM is installed.

Installing refreshSG into ClarkConnect v1.3

Here are some basic installation instructions:

  1. Log into your ClarkConnect v1.3 box as root.
  2. Retrieve the wget RPM from the RedHat site (using anonymous ftp):
    cd /pub/redhat/linux/7.2/en/os/i386/RedHat/RPMS/
    get wget-1.7-3.i386.rpm
  3. Install the RPM:
    rpm -Uvh wget-1.7-3.i386.rpm
  4. Now retrieve my RPM using wget (this ensures that wget is working correctly):
  5. Install my RPM:
    rpm -Uvh refreshSG-1.1-1.i386.rpm
  6. Now bring up the webconfig screen via and click on the Admin button
  7. Finally, click on the "Blacklist Update" link (near the bottom left of the page).

The refresh script will not run until you explicitly turn it on via its configuration page. The configuration page also allows you to change the URL the refreshSG script pulls the Blacklist from. When the script does run, it will write a log of its session to /var/log/refreshSG, and that log file is displayed in the lower half of the configuration page.

The refreshSG log file will grow over time and Linux manages this using a program called logrotate that is run each day by cron. A logrotate configuration file to manage the refreshSG log files is installed by the refreshSG rpm.

Release History

v1.0 [2002-07-16] — First public release (via the ClarkConnect General Forum). Released into the public domain.

v1.1 [2002-07-22] — Second public release.  Have added extra help to the ClarkConnect webconfig page, added a man page, and exposed two additional variables to the refreshSG.conf file.

v1.2 [2002-07-23] — Third public release.  Fixed another typo in the webconfig screen and changed the way the /etc/crontab entry is made.

v1.3 [2003-07-05] — Fourth public release.  New in this release: log file rotation, ability to not pre-build the squidGuard database files, updated look & feel of PHP screens to match CC v1.3,  corrected some spelling mistakes,  updated the help text: adding new information and clarifying, the existing information, updated the default blacklist URL to match the new location of the official squidGuard blacklist, updated the ownership of some of refreshSG's files to match the new file ownerships being used in CC v1.3, made the RPM dependent upon the cc-webconfig-1.3 package, removes any refreshSG /etc/crontab entry on an uninstall of the RPM package.

©Copyright 2003, Christopher & Jean Rath
Telephone: 613-824-4584
Address: 1371 Major Rd., Ottawa, ON, Canada K1E 1H3
Last updated: 2015/02/14 @ 21:34:34 ( )