Number crunching with Big Data

Last edited: 22 December 2014

The volume of data in the world is growing at an unprecedented rate. Raw data alone does not tell us much, but put that data into a meaningful structure and turn it into information from which we can extract knowledge, and data becomes one of our most valuable ‘natural’ resources

Data is everywhere, bombarding us every minute, every day, in every kind of environment, from business and government to education and social media. Just two decades ago, we worked in megabytes. Then, just as we got used to terabytes, petabytes emerged.

These giant leaps represent a huge opportunity for companies to enhance their strategic thinking and commercial advantage, but the trick is in mining this vast resource effectively. As it has been said: “A GPS coordinate is data, a contour is information, a map is knowledge, and someone who knows how to read it is wise…”

When data becomes too difficult to manage by conventional means, either because of the volume, the speed, or the variety, it becomes what is known as Big Data, not necessarily ‘big numbers’ – although many certainly are – but numbers that, as well as being present in huge Volumes, also come at high Velocity. The Variety of data compounds the problem. The fourth ‘V’ of Big Data is Veracity, because data may not be of consistent quality, yet may still be useful. Managing all four ‘V’s and applying algorithms and data analytics – the mathematical treatment of data – can help companies to make faster, better and more focused business choices.
Speed at which the world's information base doubles
Statistics courtesy of IBM
Amount of data Twitter processes every day
Number of devices connected to the internet
Retailers, airlines and banks are doing this already, tracking customer behaviour to target services and resources more effectively. And, there is a growing recognition in BP of the immense opportunity that Big Data could offer to help it better understand reservoir activity, increase refinery efficiency, improve biofuels yields, and make better trading choices. In 2012, the organisation established a decision analytics network – now 200-strong among its professionals – to examine ways to advance use of data and to help BP’s businesses harness these opportunities.

Data in motion

“As an organisation, as an industry, we are increasingly putting more sensors into our facilities, on rigs, wells and pipelines, for example, to measure temperature, pressure, chemicals and equipment vibrations,” says Paul Stone, decision analytics network leader. “The variety of sensors available is increasing all the time and we are getting more data back, in real-time, with ever-shorter cycles. This is data in motion, not data just sitting on a disk drive, and it is telling us about operational conditions that can be used by our businesses in order to further strengthen safety and improve performance. So, we need to make sense of it as it is acquired, not after the fact.”

The rapid growth in BP’s data volumes is a direct result of its greater ability to acquire it in the first place. Data is acquired for a range of reasons, including reliability and performance. For instance, by installing fibre optic ‘distributed acoustic sensing’ in its wells, BP will be able to receive data from deep underground that lets production teams know where and how effectively the well is producing hydrocarbons. Such a system can easily deliver many terabytes of data each day from just one well.
"The biggest benefit of analytics is that it provides the opportunity to predict what will happen, instead of recording what has happened or is happening."
- Paul Stone
“Real-time monitoring enables us to see if the relationships between physical properties while drilling a well, for example, are changing unfavourably, because understanding the various patterns that are formed by these relationships is important when it comes to diagnosing problems,” says Stone. “The biggest benefit of analytics, though, is that it provides the opportunity to predict what will happen, instead of recording what has happened or is happening. All these different data points allow people to spot patterns as they form, patterns that point to future conditions before they occur and, with all this extra data, we can take the right action ahead of time.”

Super computing power

BP strengthened its commitment to computing and advanced information technology in October 2013, with the opening of its new Center for High- Performance Computing (CHPC), the largest commercial research and development computing resource in the world. Replacing a previous facility, this ‘supercomputer’ was established primarily to help with seismic imaging, but its vast data-crunching capability – 3.8 petaflops of computing power and 23.5 petabytes of disk space, equivalent to more than 40,000 laptops – is available for use by the whole organisation.

The site’s location in Houston, near institutions such as Rice University and the University of Texas, means that the CHPC can attract some of the best mathematicians and computer scientists, aligning itself with the likes of technology giants Google, Amazon and Microsoft, themselves avid consumers of data.

“Since 1999, BP’s computing needs have grown by a factor of 22,000, with computing power doubling almost every year,” says Keith Gray, manager of the CHPC. “To illustrate that, in 2004, a new group of computers that we put in popped the main circuit breaker for Westlake [BP’s Houston headquarters], putting it in the dark. It soon became clear that a new computing facility was appropriate to support the organisation and its businesses.”

In BP’s Upstream business, the CHPC facility helps the seismic imaging teams to simulate, process and predict what will happen in a reservoir. It does this by processing and managing huge volumes of geological data from across BP’s global portfolio, helping teams to see more clearly below the Earth’s surface. It helps reduce the amount of time needed to analyse large amounts of seismic data and can also enable more detailed in-house modelling of rock formations before drilling begins. With field developments costing billions, this knowledge is invaluable and its pursuit puts BP at the forefront of seismic advances.

“Data volumes 25 years ago were measured in megabytes,” says John Etgen, distinguished adviser, seismic imaging. “Individual data sets are now approaching petabytes in size, with volumes having grown by a factor of a billion, just in my career. But, this is not only about volume, because the way sound waves propagate in the earth is very, very complicated and the science we use is never complete. Sound waves react to things down to the scale of grains of sand. Even at very high resolution, the images we can make today still have gaps bigger than the size of a conference room.”

Faster and higher

Etgen continues: “Computing power has always been a limiting factor for this industry and increased computing power certainly enables us to do things much faster than we could before, but what is important for the business is image resolution, not just timescales. With faster computers, like the CHPC, we do not necessarily use them just to go faster, but we let the computer run to its limits and give us the highest resolution we can get. These higher resolutions allow us to ‘see the future’ and to find out what kind of value a reservoir is going to deliver, earlier than before.”

The data-processing and knowledge generating power of the CHPC has had a global impact, with BP’s businesses in the Caspian, Trinidad & Tobago, the North Sea, Gulf of Mexico and Indonesia all benefitting.

“The CHPC allows us to investigate the most modern seismic processing methods ever known, and, even better, to create some new ones,” says Etgen. “And, this is at the highest possible resolution that we can drive out of the data. It can’t do everything we want to be able to do, not yet, but we are in a good place, keeping our key resources of people, ideas and the fundamental data itself all aligned and appropriately invested.”

Future-proof facility

The CHPC might be the world’s largest commercial facility of its type, but computing needs are changing exponentially. The CHPC has been built to be ‘future-proof’, and is looking to potentially double its computing power between now and the end of 2014 and to triple by early-2016.

While the supercomputer primarily supports the seismic teams, BP IT&S’s Innovation Lab, housed in the same building, has been created as a ‘hot zone’ for testing bold new ideas and proofs of concept for any part of BP’s global business.

“We wanted to take some of the things that make the CHPC so successful – its agility and its ability to solve problems quickly – and use them to test ideas,” says Stefan Garrard, the Lab’s manager. “This is a separate set of resources, available to any part of BP’s global business that wants to come in and try out proofs of concept, in a very quick and very agile environment, before things are done in a production situation, with time and money at stake.”
To read this feature in full, see the November 2014 issue of BP Magazine.

Related content