XML-based data flow descriptions

Posted by Sebastian Ertel (sertel) on January 31, 2010  • 

The Ohua system has been extended with a Flow Parser (SAX) that allows to specify data flows in an XML document. This allows even non-programmers to design and submit data flows to the Ohua engine.

A simple flow consisting of a Randomg Data Generator and a Consumer(Peek) can be described like this:
The properties of an operator are defined as Java beans that are being deserialized using Castor (http://www.castor.org). At that point it makes again sense to write new operators or at least their property beans in Scala (http://www.scala-lang.org) as it provides an @BeanProperty annotation that makes getter and setter methods redundant. The (Castor) mapping is defined in a descriptor file along with the structure of the operator.
The Generator used in the example flow above has the following description: Last but not least Ohua allows to specify runtime parameters in a separate file in order to allow running the same flow in different runtime setups. This configuration runs the flow in a multi-threaded fashion with an initial size of 10 and a maximum of 20 threads. The memory used for sending the data throw the graph is managed by setting the maximal amount of packets stored in one arc to 250.

Current project status

Posted by Sebastian Ertel (sertel) on December 20, 2009  • 

In its current version Ohua’s engine has been fully implemented and is ready to execute data flows in either a single-threaded or a multi-threaded fashion by exploiting pipeline parallelism inside the flow graph. Furthermore it is possible to run processes in a massively parallel processing fashion or even run various different flows achieving different goals in the very same process. Initial work has been conducted to run flows on a cluster of machines.

Ohua’s fault tolerance capabilities have been implemented and tested successfully for various types of data flows.

The operator suite consists of the following operators:

  • DatabaseReader operator
  • DatabaseWriter operator, various flavors with respect to transactional semantics (batch, non-batch etc.)
  • DatabaseLookup operator, a simple version of an operator to issue queries against a database and enrich the flow with the results
  • Split operator, distributes data evenly in a round robin fashion to its 2 outgoing arcs
  • DMerge operator, a deterministic merge in constructing a round robin result
  • NDMerge, a non-deterministic merge with respect to arrival rate of the data among its input ports
  • Experiment operator,  for experimental purposes that has the processing time per packet as a specification parameter
  • Peek Operator, data output to standard out
  • Generator, a random value generator for testing purposes

… more to come!