Skip to Content.
Sympa Menu

cacert-sysadm - disregard my email Re: CAcert infrastructure ... past, present, future tasks

cacert-sysadm AT lists.cacert.org

Subject: CAcert System Admins discussion list

List archive

disregard my email Re: CAcert infrastructure ... past, present, future tasks


Chronological Thread 
  • From: Chuck Grandgent <chuck.grandgent AT gmail.com>
  • To: cacert-sysadm AT lists.cacert.org
  • Subject: disregard my email Re: CAcert infrastructure ... past, present, future tasks
  • Date: Sun, 27 Oct 2019 17:20:27 -0400

wrong recipient, disregard.

On Sun, Oct 27, 2019 at 5:19 PM Chuck Grandgent <chuck.grandgent AT gmail.com> wrote:
I used to be in the Quality Assurance business.  There, the question is, how do you quantify quality or lack thereof ?
The answer is, what is the monetary cost of lack of quality? (also reputation and other costs)
In the case of Apple, how much did it cost them to tie up support people for between 2 and 3 hours?  I ended up dealing with I think 6 people.  Why were the first 5 support people clueless ?

Which could have been totally avoided if they had sent out an email notification about the change.
Multiply that by how many other people called support about this.


On Sun, Oct 27, 2019 at 4:02 PM Jan Dittberner <jandd AT cacert.org> wrote:
On Sun, Oct 27, 2019 at 12:07:16PM -0400, Don Harris wrote:
> Hi Jan,
>
> Thanks for your email.
>
> I think some of the confusion lies in the urgent call for SysAdmins, sent
> out by Brian McCullough, on September 6th.
> There were roughly a dozen replies to that message offering help.
> As one of those who replied having an interest in helping, we've since
> heard nothing.
>
> So, it appears that there are lots of people willing to help, but no one
> knows what's needed. If you're in a position to further outline what's
> needed, and any vetting process as you mentioned, that will likely help to
> get things started.

Hi Don,

the help needed mail from our board was very broadly addressing several
topics where CAcert needs help and I had no time to prepare a list of
open task in advance and was a bit overwhelmed and unprepared for the
mail and the many responses. I'm very grateful for all the help offers
though.

I try to give a bit of background about myself, and of where we came
from and where we are now with our infrastructure first:

About me
========

I am an IT architect from Dresden (Germany) and do volunteer system
administration work for CAcert since 2009, I started with maintaining
the SVN server at this time and took over administrator other systems
including our infrastructure server in 2010 before Daniel Black left
CAcert in 2012 (if I remember correctly). I took over the infrastructure
team lead role from Mario Lipinski in 2015. I do volunteer work as a
Debian Developer and help with Debian booths in Chemnitz (Germany) for
many years.

I have a full time job and family with three kids and my available time
is therefore quite limited.


CAcert's infrastructure
=======================

CAcert's infrastructure is split in two main parts: critical systems
(signer, user database, web frontend, firewalls) and non-critical
systems. The infrastructure team takes care of the non-critical systems.

When I came to CAcert in 2009 there was an ongoing effort to clearly
separate the two areas (critical/non-critical) and we finished this in
2011. Since then we had a separate hardware machine for the non-critical
infrastructure. In 2015 we got a new machine that was donated by Thomas
Krenn and is what we have today.

All non-critical systems are running on a single hardware machine [1] in
separate LXC containers. This hardware machine has been updated to
Debian Buster in July [2] and is primarily maintained by me. A second
active administrator with LXC, ferm/iptables/nftables and Debian skills
would be nice but requires a high level of trust.

Some of our systems have seen no update for many years because they have
no properly documented way to upgrade their application software and/or
configuration.

All our infrastructure is running on Debian GNU/Linux. I prefer to run
all software from official Debian packages but we have some older
systems that do not follow this rule unfortunately.

Documentation
-------------

[1] https://infradocs.cacert.org/systems/infra02.html
[2] https://lists.cacert.org/wws/arc/cacert-sysadm/2019-07/msg00001.html

The documentation for our systems is of varying quality. Some systems
have been setup and documented properly in the last few years. New
documentation is written with the Sphinx documentation system [3] and
maintained in a git repository. This is what is published on
https://infradocs.cacert.org/ automatically. We use some extensions for
Sphinx to automatically generate IP address, SSH host key and X.509
certificate lists automatically from the individual system's
documentation pages. Documentation was previously located in the CAcert
Wiki [4] but is almost unmaintained there.

[3] https://www.sphinx-doc.org/
[4] https://wiki.cacert.org/SystemAdministration/Systems

There are still some scarcely documented and unmaintained systems that
would need someone to do some archeology and especially find out which
modifications to upstream software have been made so that they can
either be upgraded or replaced with freshly setup systems.

Configuration management
------------------------

A few years ago I started to professionalise our infrastructure setup
and introduced Puppet as our configuration management. The level of
configuration that is managed by Puppet varies between systems. Older
systems are not managed at all or only have etckeeper to provide a
minimum of version controlled configuration. Some newer systems
(motion.cacert.org and email.cacert.org) have been setup using Puppet
and have all their configuration in Git. My hope is that this will be
the default but requires a lot of work.

All Puppet code can be found in Git [5].

[5] https://git.cacert.org/gitweb/?p=cacert-puppet.git

Monitoring
----------

We use Icinga 2 for our monitoring needs. We have an Icinga2 master
running on monitor.cacert.org. Newer machines that are managed by Puppet
are running Icinga 2 agents and older machines are monitored remotely
via NRPE. I run an external Icinga 2 agent in a different data center
for monitoring external access to our systems.

The Icinga 2 monitoring checks are defined in a Git repository [6] and
automatically applied to the actual systems via a Git post-receive hook.

[6] https://git.cacert.org/gitweb/?p=cacert-icinga2-conf_d.git


Tasks
=====

From my point of view most help is needed for documenting the currently
poorly documented and/or unmaintained systems to allow a decision how to
update or replace them.

I would also like to have support moving the existing configuration into
Puppet code and using Puppet on the remaining systems that are not
managed by Puppet yet.

It would be great if someone would take care or share the load of
maintaining the following systems:

- blog.cacert.org - is running an outdated Wordpress setup and would
  need upgrades by someone knowing PHP and Wordpress well enough to
  analyse and port the CAcert customizations

- board.cacert.org - an old OpenERP installation that would have to be
  replaced by something that is maintained upstream to be able to update
  the system to a modern Debian release

- cats.cacert.org - this is our assurer self training system that is
  running a custom built PHP based software. An administrator with PHP
  and Apache httpd skills that needs to communicate with the software
  development team would be needed to allow an upgrade of that system

- irc.cacert.org has been upgraded to a current Debian release but has
  no active administrator, I take care of this system on a best effort
  base but do not know IRC administration very well

- lists.cacert.org is our mailing list system running the Sympa mailing
  list software (chosen because of its ability to run S/MIME encrypted
  mailing lists). The system needs an upgrade to a current Debian
  release and someone who knows Sympa well enough to maintain it
  properly

- test.cacert.org, test2.cacert.org, test3.cacert.org,
  testmgr.cacert.org these are the test systems for our software
  development team that are used to test the software that will end up
  running on the critical systems (signer and www.cacert.org). These
  systems should be as close to the critical systems as possible and
  require coordination with the critical team and the software
  development team

- wiki.cacert.org this is our Wiki system which is running an unpackaged
  and most probably outdated version of the Moin wiki software that
  could need some love, upgrades and documentation

I am still working on upgrading webmail/community.cacert.org and split
out the self service aspects of this system to a separate LXC container.
This is still work in progress and if someone is interested I would be
willing to share my ideas and would be grateful for
feedback/comments/reviews.

I would also like to improve backups of the whole infrastructure. At the
moment we can perform backups from the infrastructure host to an
externally attached backup disk but these do not consider application
consistence but just do dumb file system based backups with no
guarantee that databases used by the applications are consistently
snapshoted.


These are my thoughts/ideas for the moment. Any feedback and questions
are appreciated. I will try to answer all mails/IRC questions as soon as
possible.


Kind regards
Jan Dittberner

--
Jan Dittberner - CAcert Infrastructure Team Lead
Software Architect, Debian Developer
GPG-key: 4096R/0xA73E0055558FB8DD 2009-05-10
         B2FF 1D95 CE8F 7A22 DF4C  F09B A73E 0055 558F B8DD
https://jan.dittberner.info/



Archive powered by MHonArc 2.6.18.

Top of Page