Subject: CAcert Code Development list.
List archive
- From: Ian G <iang AT iang.org>
- To: cacert-devel AT lists.cacert.org
- Cc: Christopher Etz <Christopher.Etz AT cetz.de>
- Subject: Re: Some thoughts about BirdShack
- Date: Sat, 19 Dec 2009 16:02:13 +0100
Hi Christopher,
On 19/12/2009 13:28, Christopher Etz wrote:
first, let me introduce myself: My name is Christopher Etz, I live in
Germany. I met Ulrich and Dirk at the Barcamp in Mainz, three weeks ago.
Last week, I met them again plus Ian, Mario, Andreas, Markus and others
in Essen.
Welcome! For all, here is the write-up of our Essen talk:
https://dev.cacert.cl/wiki/birdshack/Minutes20091216EssenSoftwareMiniTOP
I would like to support you on the BirdShack project. My
professional background: I'm a freelancer consultant since 15 years. The
main topics, I work on, are: databases, data models, application
architectures, project procedures, data warehouses and BI.
Super!
I would like to share my impressions about the concepts for BirdShack
that I have read so far. Hopefully, I don't offend anybody with my
feedback. If this mail starts a real thread of replies, it might be
worth to seperate the topics into individual threads.
So far I see nothing offensive, and I'm pretty thick-skinned anyway.
1. The API and RESTful services
The plans about a HTTP(S) based communication between the
(frontend) application and the business logic layer seem to be
pretty settled.
yes, this was the major area where the Innsbruck went deep and detailed. The reason for this is that this is the key battleground between the website and the business logic. If this is established strongly, the two respective teams have at least one foot on the ground. It's also what audit wanted, a clean separation.
https://dev.cacert.cl/wiki/birdshack/Work_History
Obviously, a call via this mechanism is by several
orders of magnitude more expensive than a direct call within
whatever programming language. Thus, an activity within a business
process (german: Geschäftsvorfall; is there a better translation?)
should result in one or very few of such calls in order to achieve
acceptable performance.
OK, just a point on performance. This is security code, and it has to be auditable security code. So, performance has to take a backseat in this design. Obviously there are some areas where we can get some performance, but the presence of a fully tiered API that is clearly accessed by a call to a port is a very important element in audit; we can very clearly draw a line at that point and confidently ignore what is on the other side of that line, while we audit our side.
https://dev.cacert.cl/wiki/birdshack/Security
Another way of putting it is that auditing performance is more important than CPU performance :) CPUs are cheaper than auditors.
(I realise you are not criticising that decision here, but I recall that Markus brought up the point that a tiered structure raises risks that might cause overall failure of the project, over a monolithic design. Your question reminded me of another reason why we went this way.)
As a consequence, the API should be
oriented towards the activities of the business processes.
From what I understand, the API is currently more oriented
towards the entities (classes, instances) that the business logics
knows. E.g. creating a new member would result in individual calls
to create the member, a name, a DoB, an email address and perhaps
other things. Another problem is, that it is hard to preserve the
consistency of the database. E.g. every application deleting a
member must know and implement calls to delete the name, the DoB,
the email addresses, perhaps the assurances and so on.
I see your point. Yes, when I did this approach in my last big similar API (a thing called XML-X for payments transactions) it was a business language approach, not an entities approach.
My recollection here was that the 3 other guys at Innsbruck were quite well agreed that the entities approach was better. They were following the RESTful school, which isn't in my experience.
https://dev.cacert.cl/wiki/birdshack/Resources
https://dev.cacert.cl/wiki/birdshack/URL_Space
Having said that, I actually see this as not a big deal, because in my experience, having shaken out the core on this basis, it is often easier to migrate to new calls that provide better matches between the two teams ... as the experience develops. As long as the two teams can clearly express themselves to each other, this can happen. And if they can't, then no API design will help...
One point that I miss in the current information within the wiki,
is the kind of response that the services produce. I expect, you
think of XMLs containing the requested information.
Yes, that was an expectation. But it wasn't written.
Then, it is up
to the different applications to decode (unmarshall, deserialize)
the response. This would require to implement similar code in
every programming lanuguage of the different applications. Or am I
wrong here?
That's correct, every language would then need to implement a set of objects and a set of unmarshalling / deserializing code. I don't know a way around this. In my experience, trying to cut corners here never works out; better to write it all, fully, and then unit-test it very heavily ...
2. Hand written code
Between the lines (i.e. not being expressed explicitly) I've got
the impression, that you expect all code to be written manually, like:
* individually coded SQL statements
* applications written in PHP, Python or whatever, building
the HTML pages, communicating with the business layer and
unmarshalling / deserializing the returned results
* Coding some functionality for a modern ("nice and sexy")
user interface in JavaScript
One point: I expect the entire business logic middle server to be hand-written, either completely or in the majority.
The reason for this is that it is the major defensive line for security. And it is the major attention for audit. Any code that is acquired from elsewhere has to support that audit. And that means either it comes with an audit, or there is some other equally-easy way to say "it's reliable".
I don't want to go as far as to suggest a model-driven
architecture
(http://en.wikipedia.org/wiki/Model-driven_architecture). But
there are means to automate some of these tasks (see my next point).
Second point: in contrast to that, the web frontends can be less cautious. So I would think that this is a plausible suggestion; I don't know how far we can go in downloading some shlock from the net, but we can certainly be more relaxed about leaning on some good tools.
Alejandro commented here on this:
https://dev.cacert.cl/wiki/birdshack/API
3. Technologies that might be useful
Within the architectural overview, a decision should be suggested
about how to access the database. The technologies of interest are:
* Pure SQL via JDBC (if the programming language is Java)
Java was the expectation; it depends on the team that does the work, but the feeling is that Java has by far the best readability and maintainability in the long run (it has other sins to compensate, it's by no means good for everything).
* Use of an ORM (object relational mapper, such as Hibernate)
Possibly, depends on security. I like the sounds of an ORM.
* Use of JPA (possible with Tomcat) or even JTA (requiring an
application server like JBoss)
Depending on our security/audit expectation of those tools. The bigger they are the less likely. (In my personal experience in secure Java servers, we didn't use them and we were very happy.)
The user interface could also be coded with some standard
technology such as
* JSP (JavaServer Pages)
* JSF (JavaServer Faces)
I personally, would vote for JSF, because there are a lot of
implementations, even open source, that offer the chance to have a
modern user interface: Some of them implement AJAX or other
JavaScript code automatically, without forcing the developer to
code this manually. Candidates are:
* Suns Reference Implementation (minimalistic and poor)
* Apache Myfaces (widely spread, still simple)
* ICEfaces (good interfaces with lots of AJAX)
* JBoss Richfaces (even more AJAX, but tricky to use and
neither very stable nor very robust [in my impression])
* Primefaces (a turkish implementation with much AJAX; but I
still haven't work with it).
The discussions and decisions on the technologies to use will
spread another light on the necessity of a HTTP(s) based
This depends on the team doing the frontend. It's much more open. If you are doing the frontend, you can pick the above.
https://dev.cacert.cl/wiki/birdshack/Architecture
In the Innsbruck meeting, we deliberately made this conscious choice to set a hard API over the business logic, so we could set free the teams doing the web front ends, and so we could anticipate exporting the capability to other more exotic applications.
communication.
4. Other candidates for the database system
From what I read, I've understood that you want to move from
MySQL to another database system such as PostgreSQL.
No, I don't think that choice was made. What choice was set was that we expected to have to use a SQL / relational database. We left the final choice to later, for the team to work out.
As it happens, there is a MySQL database in the current system. So depending on the migration / integration plan that falls out, the team may prefer to stick with the existing database (one foot on the ground) or build it afresh elsewhere (fresh start, fresh data).
Althought I
don't really know MySQL, this seems to be a good idea in my opinion.
In my opinion, PostgreSQL requires regular maintenance (the famous
vacuumdb at least) and might not be the best option when
recoverability is a high requirement.
:-) I must admit this is a mystery to em ... I do not know why people in the database world think recovery is not ever a high requirement? Is an unrecovered database still a database???
Certainly recoverability is a requirement. The way I've done it in the past is: ok, now throw away your live data set, recover the database from your last backup and show me it running with the users. That might be seen as obsessive, but my work was in payments, and no penny was every lost. I don't know that we care so much in the CA world about that, we can probably afford to lose a cert or a revocation or two as long as we have the other systems to deal with it.
Other candidates that come to my mind are:
* Oracle: Might be too big (resource intensive) and it is
questionable, if they support a non-profit oragnization with
a free license.
* Ingres: Used to be an important player in that area (agreed,
that's decades ago), now turned into open-source, has
excellent features for performance and recovery.
5. Thoughts on the approach
I saw some code fragments in the SVN repository already. But I
believe, it's too early to continue here.
I've had a look at the Java that Mario wrote in the SVN, yes, very early stuff, but we have to start somewhere, and I think he chose a good place to start. I refactored it on the train back, but not committed anything. I'm not sure about how to compile it up and test it as yet; without that I'm just fiddling about.
Instead, I'd suggest an approch like this:
1. Parallel activities on:
* Analyzing / defining / collecting the activties of the
business processes
The wiki page "Actors" seems to be a good starting
point here.
yes.
* Defining a set of "candidate" technologies (there are
probably more than I mentioned, both on the mentioned
layers and on another layers)
Yes, agreed. we did try to do that, and keep an open mind to the flame wars, etc.
The team for the front end has a lot more liberty in choosing their technologies, I think.
For the middleware consensus existed that Java and POJO was the way forward. For the backend ("signing server") it was expected to be C, but that's less relevant to this discussion. For the other backend ("SQL") ... that is left open for now.
From the flame wars section:
https://dev.cacert.cl/wiki/birdshack/Choosing_a_Language
* Defining a set of criteria, against which the
technologies will be evaluated, once the business
processes are known.
2. Putting the results of together, i.e. agreeing on the
business processes, evulating the technologies and defining
a set of technologies to go with.
3. Adapting the architectural overview to the planned
technologies (here, the decision about the API and its
communication mechanism should be taken).
4. Starting the implementation. Here, I presume that BirdShack
will be a re-write of new code and not a patchwork on old code.
Having said all this, I want to make clear, that I never wanted to
diminish the value of the work that has been done so far. Hope you don't
regard me as a negative critic.
Not at all. We did a lot in that week. But much more needs to be done!
iang
- Some thoughts about BirdShack, Christopher Etz, 12/19/2009
- Re: Some thoughts about BirdShack, Jan Dittberner, 12/19/2009
- Re: Some thoughts about BirdShack, Ian G, 12/19/2009
- Re: Some thoughts about BirdShack, Christopher Etz, 12/22/2009
- Re: Some thoughts about BirdShack, Ian G, 12/22/2009
- Re: Some thoughts about BirdShack, Christopher Etz, 12/22/2009
Archive powered by MHonArc 2.6.16.