New programming language delivers fourfold speedups on problems common in the age of big data

New programming language delivers fourfold speedups on problems common in the age of big data

newprogrammi

Researchers have designed a new programming language that lets application developers manage memory more efficiently in programs that deal with scattered data points in large data sets. In tests on several common algorithms, programs

 

In today’s computer chips, memory management is based on what computer scientists call the principle of locality: If a program needs a chunk of data stored at some memory location, it probably needs the neighboring chunks as well.

But that assumption breaks down in the age of big data, now that computer programs more frequently act on just a few data items scattered arbitrarily across huge data sets. Since fetching data from their main memory banks is the major performance bottleneck in today’s chips, having to fetch it more frequently can dramatically slow program execution.

This week, at the International Conference on Parallel Architectures and Compilation Techniques, researchers from MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) are presenting a new programming language, called Milk, that lets application developers manage memory more efficiently in programs that deal with scattered data points in large data sets.

In tests on several common algorithms, programs written in the new language were four times as fast as those written in existing languages. But the researchers believe that further work will yield even larger gains.

The reason that today’s big data sets pose problems for existing memory management techniques, explains Saman Amarasinghe, a professor of electrical engineering and computer science, is not so much that they are large as that they are what computer scientists call “sparse.” That is, with big data, the scale of the solution does not necessarily increase proportionally with the scale of the problem.

“In social settings, we used to look at smaller problems,” Amarasinghe says. “If you look at the people in this [CSAIL] building, we’re all connected. But if you look at the planet scale, I don’t scale my number of friends. The planet has billions of people, but I still have only hundreds of friends. Suddenly you have a very sparse problem.”

Similarly, Amarasinghe says, an online bookseller with, say, 1,000 customers might like to provide its visitors with a list of its 20 most popular books. It doesn’t follow, however, that an online bookseller with a million customers would want to provide its visitors with a list of its ,000 most popular books.

Thinking locally

Today’s computer chips are not optimized for sparse data—in fact, the reverse is true. Because fetching data from the chip’s main memory bank is slow, every core, or processor, in a modern chip has its own “cache,” a relatively small, local, high-speed memory bank. Rather than fetching a single data item at a time from main memory, a core will fetch an entire block of data. And that block is selected according to the principle of locality.

It’s easy to see how the principle of locality works with, say, image processing. If the purpose of a program is to apply a visual filter to an image, and it works on one block of the image at a time, then when a core requests a block, it should receive all the adjacent blocks its cache can hold, so that it can grind away on block after block without fetching any more data.

But that approach doesn’t work if the algorithm is interested in only 20 books out of the 2 million in an online retailer’s database. If it requests the data associated with one book, it’s likely that the data associated with the 100 adjacent books will be irrelevant.

Going to main memory for a single data item at a time is woefully inefficient. “It’s as if, every time you want a spoonful of cereal, you open the fridge, open the milk carton, pour a spoonful of milk, close the carton, and put it back in the fridge,” says Vladimir Kiriansky, a PhD student in electrical engineering and computer science and first author on the new paper. He’s joined by Amarasinghe and Yunming Zhang, also a PhD student in electrical engineering and computer science.

Batch processing

Milk simply adds a few commands to OpenMP, an extension of languages such as C and Fortran that makes it easier to write code for multicore processors. With Milk, a programmer inserts a couple additional lines of code around any instruction that iterates through a large data collection looking for a comparatively small number of items. Milk’s compiler—the program that converts high-level code into low-level instructions—then figures out how to manage memory accordingly.

With a Milk program, when a core discovers that it needs a piece of data, it doesn’t request it—and a cacheful of adjacent data—from . Instead, it adds the data item’s address to a list of locally stored addresses. When the list is long enough, all the chip’s cores pool their lists, group together those addresses that are near each other, and redistribute them to the cores. That way, each core requests only data items that it knows it needs and that can be retrieved efficiently.

That’s the high-level description, but the details get more complicated. In fact, most modern computer chips have several different levels of caches, each one larger but also slightly less efficient than the last. The Milk compiler has to keep track of not only a list of memory addresses but also the data stored at those addresses, and it regularly shuffles both around between cache levels. It also has to decide which addresses should be retained because they might be accessed again, and which to discard. Improving the algorithm that choreographs this intricate data ballet is where the researchers see hope for further performance gains.

“Many important applications today are data-intensive, but unfortunately, the growing gap in performance between memory and CPU means they do not fully utilize current hardware,” says Matei Zaharia, an assistant professor of computer science at Stanford University. “Milk helps to address this gap by optimizing memory access in common programming constructs. The work combines detailed knowledge about the design of memory controllers with knowledge about compilers to implement good optimizations for current hardware.”

Facebooktwitterlinkedininstagramflickrfoursquaremailby feather

A riddle for our time September 12, 2016

A riddle for our time

September 12, 2016

ariddleforourtime

One of the most remarkable successes of astrophysics in the last century was its discovery that the age of the universe as measured by its oldest stars was about the same as the age estimated in an entirely different way, from the recession of galaxies. Both came up with surprisingly long times—billions of years—providing reassuring confirmation that both were probably on the right track. But the two values were not identical and scientists very quickly realized a major discrepancy: the oldest stars were older than the universe itself. Refinements to the measurements and the models to resolve this contradiction were underway until 1998 when cosmic acceleration was discovered. It proved, in a single sweep, that the universe was actually much older than had been thought, and in particular was older than the oldest stars.

But there was a riddle in the discovery: The motion of the universe is governed by matter, whose gravity tends to slow the expansion down, and by acceleration which speeds it up. Since the average density of matter in the universe steadily drops as the universe swells, in time it has a smaller and smaller value. Curiously, today it just happens to have almost exactly the same value (when expressed in the same units) as the acceleration parameter. Why? There was a second riddle too: The theoretical size of the acceleration parameter could be almost anything; indeed, basic quantum mechanical calculations suggest it should be vastly larger than it is. Why it is as small as we measure is a mystery.

CfA astronomers Arturo Avelino and Bob Kirshner have just published a paper calling attention to yet another riddle. The universe did not expand at a constant rate that was just the blend of these two factors. For the first nine billion years of cosmic evolution, contraction dominated and the universe gradually slowed its expansion. Since the relative importance of cosmic acceleration grows with time, however, for the past 5 billion years acceleration has dominated and the universe has sped up its expansion. Curiously, though, today the universe looks the same way it would have if it had always been expanding steadily at a constant rate (the rate required to prevent ultimate re-collapse).

Although it sounds slightly similar to the original riddle, the authors describe why this new puzzle is actually different: We are living (apparently) in a privileged epoch; the other puzzles do not have this implication. The explanation(s) for these riddles is not yet known. If some specific new kinds of elementary particles exist, the scientists suggest, they could provide the answer, but for now the only thing that is certain is that more observational research is needed.

More information: “The Dimensionless Age of the Universe: a Riddle for Our Time,” A. Avelino and R. Kirshner, ApJ 828, 35, 2016. adsabs.harvard.edu.ezp-prod1.hul.harvard.edu/abs/2016ApJ…828…35A

Facebooktwitterlinkedininstagramflickrfoursquaremailby feather