Inspired from the Data Leaders Summit in Barcelona (and from Paul Laughlin´s brilliant jokes) on my flight back home to Berlin, I wrote down a few key insights from my own keynote and from this years conference in general. It was a pleasure to meet so many likeminded data people again!
Data leaders are a growing population in Europe
You can really see the community of data leaders is growing in Europe and becomes more mature across industries. Finally, even in Continental Europe most companies have hired heads of data and analytics. In most companies data scientists and engineers can be found spread out across the business fully engrained with a business function in addition to the central data & analytics teams (the hub model). Data leaders and their organisations become increasingly well intertwined and accepted within the old established business functions. The really good news is that most of us has left the digital ivory tower by now and do good things to the core business.
AI is clearly the new buzzword – the data fashion show continues
Last year, the buzzword Big Data finally disappeared, but everyone was talking about data science. This year everyone talked about AI and machine learning. I guess its one part of belonging to the data profession to adapt to the newest buzzword each year! My keynote was renamed by the conference editing director from „data science for grown ups“ into „succeeding in AI“. In my own company, it was difficult to explain to people that our data scientists can also work on AI (basically I ended up writing AI all over most presentations). I decided from now on to call data scientists the data & AI scientists to signalise that these are the same people. From a more critical point of view, one could argue that real AI is still taking a bit of time. The kind of AI that most people talk about is not a single algorithm, but a systems of a multitude of interconnected algorithms embedded in a well managed algorithm architecture that can make machines sense, think, learn and act (see also my past article on Smart Machines). Nobody has achieved this so far to my knowledge. My colleague Dat Tran, Head of Data Science at Idealo, summarised it nicely in a Linkedin posting today: „We´re so far from what people call real AI. All what we do is some mathematical optimisation on some data. And if the data is bad the AI also is bad“.One thing that I pointed out in my keynote today was that we missed out to build knowledge taxonomy for our machines, they all need to rely on raw data when we run machine learning algorithms. But how can machines learn about society if they don’t read books?
Underdeveloped Data & AI Engineering and Ops major barrier
One of the main themes that came out of the conference was that realisation that most of us had hired to many data scientists while neglecting to hire enough data engineers and data ops guys. Developing algorithms is an important but only a small proportion of the success of a data & AI project. We ignored that once the analytical model is designed, it is mostly about software engineering! We need a lot of engineers to create data pipelines and connect them to our legacy systems. Data engineers have the focus on software engineering rather than modelling. Software and architecture skills are key for developing ETL processes, integrating data ad-hoc and cleansing, writing APIs and database connections, creating CI/CD pipeline and DevOps, testing and deploying data products and adding new components to the platform. Data Ops are needed to support the operation and maintenance of finished data analytics & AI products. Data Ops contains tasks that are perceived as unattractive for data scientists, such as managing the code repositories and the rules for deployment and the productive environment, running the Helpdesk, managing SLAs, archiving data and ensuring load balancing. There is a clear need for more enthusiastic Data Ops people, I find it very difficult to find the right people for the job at the moment and they certainly need more appreciation by the data community. A highlight at the conference was certainly Harvinder Atwal´s talk discussing how moneysupermarket.com has approached and solved Data Engineering and Data Ops in a neat fashion.
Challenge of running multiple clouds and data gravity
Setting up the increasingly complex cloud environments and create the channels to move data from on-premise into the cloud and back is another big engineer challenge, even for digital unicorns, and the increasing number of clouds and technologies that you need to managed does not make it easier. This was pointed very nicely out by the VP Data Infrastructure at Zalando, Kshitij Kumar. Each cloud has their strengths and weaknesses and we might need to live with the fact that we need more than one cloud environment in the future. Data gravity was discussed in a number of sessions, its increasingly the deciding factor where you run your algorithms. Still, many companies struggle to convince the management of their IT Security Officers that moving to the cloud is a safe and reasonable option. This will turn out to be a big barrier to innovation and growth for traditional companies.
Innovations brought from the Travel Industry
One of the best things that I learned was from Booking.com´s Director of Data Science, Ting Wang. Apparently, booking.com has established a central feature store in one or more of their business areas which is well governed and versioned. This allows data scientists to massively reduce the amount of work they do as they can reuse most important features across their models. A great advantage is that features are finally standardised and from a kind of single version of truth! Another great inspiration comes from another peer from the Ttravel industry. Charlie Ballard, Global Director of Strategic Insights at TripAdvisor had a fabulous presentation on how you can use your data to build data products that you can actually sell to your customers. Knowing who travels when and where and visits which hotel accommodation sides allows you also to analyse as a hotel who your true competitors are. Apparently the guys at Ritz Carlton got it all wrong and ignored that Hyatt is one of their big competitors, as they thought that they play in a different segment. Data speaks louder than words. It turned out that customers would see the two chains as roughly exchangeable and would choose the cheaper one of the two options – which typically made Hyatt a winner!
Complexity avoidance and the White Swan Theory
If you can build a great data product without using machine learning, its the best way to go as you can reduce unnecessary complexity that makes putting it into robust production and scaling it much more difficult. This was the hypothesis that was raised by at least two of the speakers, Ryan den Rooijen, the Head of Data at Dyson, and Stefan Meinzer, Head of Advanced Analytics EMEA at BMW. A challenge is however to keep your top data science talent happy who strive to solve complex problems and to apply the most sophisticated and intellectually beautiful solutions. And here comes Ryan´s White Swan Theory into place. For executives you show a beautiful lake from above with even more beautiful white swans swimming on the lake. However, under the surface of the calm water you hide the complexity and the chaos of machine learning, the bad data formats and abundant data quality problems that are unavoidable whenever you want to implement something. It allows your top talent to run very complex models and then to simplify how you communicate about them to the business. Something that I mentioned in my own keynote is the fact that we need to contain our data scientists more when it comes to the choice of tools, programming languages and packages. Every new bit adds a new layer of complexity when you want to put models into production. To me, a major priority is to standardise the Data & AI environment so you develop and deploy the models on the same tech stack. On top the redesign of control and compliance processes around Purchasing, Legal Processes, Evaluation Processes (KPIs), Auditing Processes, Security Processes and Ethics Processes (aka the “PLEASE” framework) to make them fit for Data & AI needs to become a major goal for every Head of Data. This is again a major cultural shift and requires discipline in the area of data science that from a lot of viewpoints still needs to grow up and mature.
In summary, a great conference in 2018 with a fabulous bunch of data leaders. I learned a lot from them, as every year!