Data Analytics

CERNs know-how and experience with ‘big data’ analysis for high energy physics and control of systems used in the LHC.

CERN's Know-How

Cern's experiments probing the fundamental nature of the universe creates 1 PB/sec —roughly four times that held in the US Library of Congress
About 1 million CPU cores worldwide are used to process and analyse all data from LHC, using advanced data analytics
Additionally, online and offline analysis of the data acquired from each of the 20,000 devices that monitor and control the CERN complex

Facts & Figures

>10 PB/month of data selected by trigger mechanisms and stored in CERN Data Center
170 Data Centers worldwide at which LHC data analysis is being done
>250 PB CERN Data Center storing all physics data for analysis
>1000 PB of ROOT data

Value Proposition

Key Competences

Designing Data Analytics Infrastructure

In order to process and analyze the vast amounts of data generated by the experiments at CERN, a data infrastructure was designed for distributed analytics. This infrastructure is made of various layers and allows 1000 clients to access the data for analysis, handling >5 million data transaction per day. With its unique knowhow in structuring big data sets, CERN can elaborate efficient analysis.

Components used for big data and related analytics

User Interface: Notebooks, SWAN (developed by CERN)
Data analysis: ROOT / TMVA (developed by CERN)
Apache Hadoop clusters with YARN and HDFS (also HBase, Impala, Hive,...)
Apache Spark for analytics and Apache Kafka for streaming

Data Analysis for Control Systems

CERN analyses data from its large industrial infrastructure, for monitoring, control and predictive maintenance purposes. This includes data from accelerators, detectors, cryogenic systems, data centers and log files from the Worldwide LHC Computing Grid and others.

Specifications

Online monitoring (analysis of logs, alarms, loads)
Fault analysis (root cause analysis / fault detection)
Predictive maintenance
Safety
Input for new engineering designs

Key Applications

Robust Big Bata Analysis Framework

ROOT / TMVA is a modular big data software framework, providing the functionalities needed to deal with big data statistical analysis, visualisation and storage. It is mainly written in C++ but integrated with other languages such as Python and R. Integrated machine learning environment (bindings for Python is provided). Good for analysis of extreme large sets of structured data. Used in industry, physics, biology, finance and insurance fraud analysis. Possible application in processing and analysis of large medical datasets, for example genomics data, EEG/ECG data, biosensor data.

Interactive Data Analysis in the Cloud

SWAN (Service for web based analysis) offers an integrated environment for data analysis in the CERN cloud where the user can find all the experiment and user data together with rich stacks of scientific software. The interface offered by the service is the one of Jupyter notebooks. For any service that allows users to perform interactive data analysis in the cloud, following a "software as a service" model. Especially for cloud based analysis of very large datasets by many users using different analytics tools and programming languages.

Data Analytics

Data Analytics

CERN's Know-How

Facts & Figures

Value Proposition

Key Competences

Designing Data Analytics Infrastructure

Components used for big data and related analytics

Data Analysis for Control Systems

Specifications

Key Applications

Robust Big Bata Analysis Framework

Interactive Data Analysis in the Cloud

Contact Person

Contact Person

Contact

Follow Us

CERN Accelerating science

Data Analytics

Data Analytics

CERN's Know-How

Facts & Figures

Value Proposition

Key Competences

Designing Data Analytics Infrastructure

Components used for big data and related analytics

Data Analysis for Control Systems

Specifications

Key Applications

Robust Big Bata Analysis Framework

Interactive Data Analysis in the Cloud

Contact Person

Contact Person

Follow Us