Wednesday, May 21, 2008

California here we come

Mono Lake, 2001, webduke pics

Indirection #2

There is a quotation from a British Computer Scientist, David John Wheeler, I do really like and where I feel a strong need to post this again: “Any problem in computer science can be solved with another layer of indirection. But that usually will create another problem." It is not new and we all know about the meaning behind it but it is still relevant today and tomorrow, and it describes the dilemma of toooo many “genericcontainerwrappingadaptergatewayframeworkolalasolutions” perfectly.

Sunday, May 18, 2008

Some Clouds over Dresden

River Elbe Valley, Mid of May 2008

Thursday, May 15, 2008

Limitations of today’s Parallelization Approaches

As already stated in detail, physics has stopped the race for more CPU clock speed. It’s about the heat, period. Hardware (Chip) manufactures response is to put more and more cores on a die – we all call this multi-core. I also tried to list and explain a couple of technologies (and also No-Go’s) to exploit those architectures; just skim through the blog. But most of the approaches are based on the process of determining sections of code which can be executed in parallel, loops for instance. Saying “sections of codes” does imply that we can expect serial remnants in the source code. And these parts [The percentage of code resisting tenaciously to be parallelized! :-)] determine the speedup factor defined by Amdahl’s Law (see post on Amdahl's Law). The dilemma looks a little bit similar to the expectations from the increase of clock speed in the past. It will not work out to lean back and wait until there are 16 or 32 cores on a single die. This will not fix the problem sufficiently because most of the existing approaches for parallelization does not scale. Other technologies and strategies have to be developed and applied. And before this will happen, it’s always a great approach to optimize the source code and to think about an effective usage of the cash (memory).

Thursday, May 08, 2008

Beautiful Dresden in May

Amdahl’s Law

There is not just Moore’s Law and Murphy’s Law, both are well known to IT and Non-IT folks, but also Amdahl’s Law which is about the maximum of expected improvement to a system when only parts of the systems are improved. It is applied in Parallel Computing commonly, and specifies the theoretical speedup of a program running on multiple CPU’s in parallel. The maximum speedup is limited by the sequential parts of the code. A boundary condition is the assumptions that the problem size is the same when running in parallel. Here comes the formula (with N = number of processors and P the percentage of code that can be parallelized; hence, (1-P) is the serial code):

Speedup = 1 / (1-P) + P/N

You can play around with this formula. In essence, the sequential part of the code must be minimized to gain speedup. And, even a high number of processors does not have that impact when the percentage of parallel code is low. Anyhow, it’s a formula. In practice, the speedup depends also on other conditions, the communication (mechanisms) between the processors and the cache [Because Cache is king! ;-)].

Monday, May 05, 2008

Wednesday, April 23, 2008

Computing in the clouds

Many people might perceive Amazon as THE online store for books, music, gadgets and many other things with advanced market place capabilities and as one of the trail blazers on the Web. But the company from the Pacific Northwest is much more. Amazon is a high-tech company when it comes to very large distributed architectures and technologies based on web services. The “Mechanical Turk” is one example, an impressive idea to use human beings as “end points” in a Service-Oriented Architecture in order to let them do what human beings can do best. I really like this concept.
Another great thing is Cloud Computing as currently (21-April-2008) covered by an article on wired.com. It’s a must read! The company offers computing capabilities for everyone. This is good news for young enterprises that don’t want to spend too much on hardware. In essence, it is a lesson and a serious example at the same time how to offer computing power (and hardware, of course) as a service. I strongly believe we can expect much more from Amazon in the future. These guys are working on a real new computing platform on the web. The vision that the network is the computer, as verbalized by other companies, might turn into reality …

Tuesday, April 22, 2008

Transactional Memory …

… is most likely one solution to overcome parallel programming issues. Currently, it is very difficult (and error prone) to use locks properly when accessing shared memory sections. Lock-free programming is much more difficult. All these issues are already covered in earlier posts.
Talking about transactional memory, this concept is not that new but not part of mainstream development platforms. Not yet, but transactional memory could be available in hardware and software soon. Basically, software-based transactional memory solutions do already exist. The concept is similar to mechanisms used in database systems controlling concurrent access to data. A transaction in the scope of software-based transactional memory is a section of code that covers a series of reads and writes to shared memory. Logically, this occurs at a single point in time. In addition, a log and a final operation, called commit, are also part of the concept. Sounds familiar? Published concepts (of software-based transactional memory) using an optimistic approach free of locks. It is up to the reader in the scope of the transaction to verify based on the logs that data in the shared memory has not been altered (by other threads). If so, a roll-back will be performed and give him another try …
This sounds really simple and desirable, right? What’s the drawback, why not using this technology for solving existing problems in parallel computing? It is the performance, decreased by maintaining logs and doing the commits. But those issues should be solved soon. It would make life easier pertaining to the understanding of multi-threading code and could prevent the pitfalls coming with lock-based programming. Each transaction could be perceived as a single-threaded, isolated operation.

Tuesday, April 15, 2008

UML and Domain Specific Languages

I joined an interesting conversation on the utilization of UML recently. The talk was about code generation out of UML (tools). Mmmh. Pro and cons will always exist when it comes to UML. I use UML for sketching and for my architecture and design documents. Use case, class, deployment and sequence diagrams are important tools to me. They create a better understanding in the team on the expected outcome based on a unified description - much better than proprietary pictures which often lead to different views in the minds of the team members. UML models do not cover non-functional requirements sufficiently. This is a real drawback when is comes to code generation. My hope is that Domain Specific Languages (DSL) will improve, gain more acceptance and will be extended to areas and domains where they appropriate.

Tuesday, April 08, 2008

Prioritize your non-functional requirements

Non-functional requirements (like usability, security, performance, fidelity) are important success criterions in software architecture and development. It is a misconception that all requirements can be covered by functional requirements. Many requirements are of twofold nature, say security, that typical consists of functional and non-functional types. If this has been recognized and accepted, prioritization is the next important step. It should be not acceptable to have TWO non-functional requirements (also known as Quality Attributes) with the very same high level of priority. This is a critical contradiction that must be communicated and resolved. If not, you might be end with a beast that is highly secured and comfortable to use (let’s call it digital monkey wrench) – which is basically not feasible. It is always a challenge to make compromises for each Quality Attribute and to communicate and explain this. Utility Trees are a great tool to list, depicts and prioritize non-functional requirements.

Monday, April 07, 2008

F Sharp (F#) – a functional programming language in the scope of the .NET ecosystem

C# is a well known programming language designed to be used in the scope of the Microsoft .NET framework. It is an imperative and object-oriented programming language and widely accepted in this managed environment.But there is another “sharp-language”: F# - developed by Microsoft Research. F Sharp is a functional programming language with imperative and object-oriented aspects. It is part of the .NET environment and can be used within the Visual Studio development environment. Programs written in functional programming languages are easier to parallelize because of missing side-effects. Concerning this characteristic (which is basically an advantage), it is worth checking on F# in order to solve parallelization problems. The language can be obtained from the Microsoft Research website and is covered by the Microsoft Research Shared Source License Agreement.

Sunday, April 06, 2008

Seattle in April




One day of snow, one day of rain, and three sunny days in the Pacific Northwest.

Friday, March 28, 2008

Use-Cases und Problemstellungen im Bereich Nebenläufigkeit und Parallelisierung

Ich schreibe diesen Beitrag in deutscher Sprache, da ich alle betreffenden Diskussionen in letzter Zeit ausschließlich in meiner Muttersprache geführt habe. Sicher werde ich den Post demnächst auch in Englisch veröffentlichen. Ich habe die Thematik bereits mehrfach angerissen – „Wann ist welche Art von Parallelisierung sinnvoll und wann nicht?“. Diese Frage steht in engem Zusammenhang mit vielen Unschärfen in den Begrifflichkeiten. Ich möchte keinesfalls engagierte Entwickler davon abhalten, sich in dieser Herausforderung zu stellen. Aber ähnlich zum Lernkurven-Vergleich drängen sich auch hier Analogien mit dem Meilenstein Objektorientierung in der Softwareentwicklung auf. Es gibt Code der ist „dermaßen objektorientiert“ geschrieben, dass er unübersichtlich, kaum wartbar und nicht mehr weiterverwendbar ist. Ein typischer Fall, beim dem über das Ziel hinausgeschossen und damit die Objektorientierung konterkariert worden ist.
Was hat das mit Parallelisierung zu tun? Im Prinzip gilt auch hier der Ansatz: was ist die konkrete Anforderung und welche Use-Cases (Anwendungsfälle) lassen sich daraus ableiten? Wobei man die Zielplattform (Systemarchitektur) nicht vernachlässigen darf. Hier eine (nicht vollständige) Liste von Fragen, mit Hilfe derer man einer Lösung näher kommen kann:

  1. Wie sieht die HW-Architektur der Zielplattform aus (Schlüsselworte: Hyprerthreading, symmetrische oder asymmetrische MultiCore-CPU’s, Shared Memory, Distributed Memory)?
  2. Welches Betriebssystem mit welchen Laufzeitumgebungen und Programmiersprachen werden auf der Zielplattform eingesetzt? Dabei ist das Betriebssystem immer mehr zu vernachlässigen, da präemptives Multitasking heutzutage Allgemeingut ist und damit zumindest parallele Prozessausführung gegeben ist.
  3. Ist Performance die wichtigste nicht-funktionale Anforderung? Mit Performance sind in diesem Zusammenhang Datendurchsatz und Verarbeitungsgeschwindigkeit gemeint. Hier geht es auch um den Fakt, dass die Zeiten höherer Taktraten vorbei sind und der Cache leider nicht die Lösung aller Probleme darstellt.
  4. Haben die betreffenden Use-Cases einen synchronen oder asynchronen Charakter?
  5. Wie soll der Datenaustausch zwischen Ausführungsströmen (ich verwende diesen Begriff hier ganz bewusst) erfolgen und in welchem Umfang? Benötige ich Zugriff auf „shared“ Ressourcen und in welchem Umfang?

Ausgehend von der Beantwortung dieser Fragen kann eine Architektur entwickelt werden, die den Anforderungen genügt und die auf die richtigen Lösungen setzt. Richtig bedeutet in diesem Zusammenhang auch, dass die passenden Techniken verwendet werden um weitere nicht-funktionale Anforderungen wir Wartbarkeit, Erweiterbarkeit, Testbarkeit und Einfachheit (!) zu erfüllen. Ich möchte und kann an dieser Stelle keine Entscheidungsmatrix liefern; werde aber kurz ein paar Lösungsmöglichkeiten skizzieren.

  • Wenn Punkt 3 das entscheidende Kriterium ist, sollte eine gute Auslastung der zur Verfügung stehenden Rechenkerne das Ziel sein. Es gibt in diesem Fall immer noch die Möglichkeit, eine Lastverteilung der Prozesse als Lösung zu wählen. Sollte dies aufgrund der Problemstellung (Arithmetik, etc.) nicht möglich sein, muss der Weg über eine Verteilung der Threads gefunden werden. Hier bietet sich beispielsweise OpenMP an. Natürlich kann sich Punkt 3 auch mit Punkt 5 überlagern, wenn auf viele „shared“ Variablen zugegriffen werden muss. Es ist anzumerken, dass OpenMP uns nicht vor Race-Conditions oder Dead-Locks bewahrt.
  • Bezüglich Punkt 4 gibt es sicherlich die am weitesten entwickelten Lösungsmöglichkeiten und gute Chancen, diese Problemstellung umfassend zu lösen. Eine gute Entkopplung sowie asynchrone Kommunikationsmechanismen sind hier die Erfolgkriterien. Konkrete Anwendungsfälle existieren vor allem im Umfeld von Benutzerschnittstellen (UI’s) auf Clientsystemen. Ausführungsströme (z.B. Threads) in der Präsentationslogik sollten von der Geschäftslogik mithilfe geeigneter asynchroner Kommunikationsmechanismen getrennt werden, um gezielt eine lose Kopplung zu erreichen.Ich möchte noch anmerken, dass die Aufgabenstellung in Punkt 5 (Asynchronität) für mich nicht originär zur Problemdomäne Parallelisierung / Nebenläufigkeit gehört, in der Diskussion aber dort oft eingeordnet wird.
  • Punkt 5 ist sicherlich die komplizierteste Aufgabenstellung, da bisher nur wenig Unterstützung durch Entwicklungsumgebungen, Compilern und Frameworks vorliegt. Bis es diese gibt (beispielsweise auf der Grundlage von Konzepten des „Transactional Memory“) muss man noch mit all den Schwierigkeiten des „Lock-Free Programming“ leben. Dazu gehört natürlich ein ausgefeiltes Testkonzept, um Dead-Locks und Race-Conditions auf die Spur zu kommen. Generell kann ich hinsichtlich Punkt 5 nur empfehlen, Architektur und Design hinsichtlich alternativer Lösungsmöglichkeiten des Datenaustausches genau zu prüfen.


Natürlich ist anzumerken, dass es auch einen Mix an Lösungen geben kann. Als konkretes Beispiel im Hochleistungsrechnen sind Hybridanwendungen aus MPI (Message Passing Interface) und OpenMP zu nennen.

Tuesday, March 25, 2008

Dogmushing

I went to Alaska in wintertime in 1998 the very first time (after backpacking in summer for many years). The reason was: Dogmushing. The picture is just an impression of a beautiful time in the cold at the Goldstream Kennel.



More Music is about to come ...

... just one tiny recommendation. Listen to ZOOT ALLURES (1975) from Frank Zappa. Title #3 - the torture never stops - is the perfect soundtrack for a day in the middle of the week. And remember, music is the best ...

More OpenMP Details (Data Scopes, Critical Sections)

Here comes a little bit more information on OpenMP, the standard programming model for shared memory parallelization. Variables can be shared among all threads or duplicated for each thread. Shared variables are used by threads for communication purposes. This was already outlined in an earlier post. The OpenMP Data Scope Clauses private and shared declare this behavior in the following way: private (list) declares the variables in list as private in the scope of each thread; shared (list) declares the listed variables as shared among the threads in the team. An example could look like the following: #pragma omp parallel private (k)
If the scope is not specified, shared will be the default. But there are a couple of exceptions: loop control variables, automatic variables within a block and local variables in called sub-programs.
Let’s talk about another important directive, the critical directive. The code block enclosed by
#pragma omp critical [(name)] will be executed by all threads but just by one thread at a time. Threads must wait at the beginning of a critical region until no other thread in the team is working on the critical region with the same name. Unnamed critical directives are possible. This leads me to another important fact that should not be overlooked. OpenMP does not prevent a developer to run into problems with deadlocks (threads waiting on locked resources that will never become free) and race conditions where two or more threads access the same shared variable unsynchronized and at least one thread modifies the shared variable in a concurrent scenario. Results are non-deterministic and the programming flaws are hard to find. It is always a good coding style to develop the program in a way that it could be executed in a sequential form (strong equivalency). This is probably easier said than done. So testing with appropriate tools is a must. I had the chance to participate in a parallel programming training (btw, an excellent course!) a couple of weeks ago where the Thread Checker from Intel was presented and used in exercises. It is a debugging tool for threaded application and designed to identify dead-locks and race-conditions.

Saturday, March 22, 2008

Google Phone

Gizmodo.com published an article (+ picture) on HTC's Google phone called "Dream" last Thursday. It got a touchscreen as well as a keypad. There are rumours that the gadget will be in store by the end of the year. Looks promesing.

Wednesday, March 19, 2008

Just a recommendation

There is a fascinating article on the WIRED MAGAZINE (16.04.) with the title: How Apple Got Everything Right By Doing Everything Wrong. This is a must read for all folks interested how Apple is doing their products, maintaining their status and myth, and with many interesting insights. It is also about company and management culture, openness, transparency, proprietary solutions, vertical integration, and the different approaches in the high-tech industry from Apple to Google.
The phenomenon is that any other company, especially the software giant located in Redmond, would have been ended with endless criticism, if they had done their products this way. Another interesting view is mentioned briefly, the so-called “three-tiered systems” consisting of blend hardware, installed software and proprietary Web application. Let’s see how this model is doing in the future because this is the antithesis to the Open Source / Open System approach.

Green IT and Parallel Programming

Green-IT is one (or the) buzzwords these days. Everybody wants this sticker on the box. I do Green-IT for many years. I switch off my computer and monitor + DSL-modem when I’m done. But this behavior should not be the topic of the post. Chip design might lead to an architecture where many cores run on a lower clock speed (than a single core processor does). This is not contradiction to Moore’s law by the way. For an application which is not designed and developed to exploit a multicore-architecture (either by process-based load balancing of multithreading), the result could be a decrease in performance. So this is my point and it is probably not far-fetched. Green-IT might be another trigger to force efforts to make applications ready for today’s hardware architecture.