Big Data Analytics & AI

Prague, 17.9.2020

In case of unfavourable coronavirus conditions, the event will take place virtually.

 ico Where

Praha 4

 ico When


 ico Duration

8:00 – 16:30

 ico Capacity

162 seats

 ico Price

3.800 CZK + VAT

 ico Speakers


Get Wild Cards at 67. and 124. seat and enjoy the conference for free (position is calculated according to received registrations)

Event log

  • 14.07.2020  09:15 - We have opened registrations...


  • 08:00-09:00 - registration and breakfast
  • 09:00-10:00 -  Production pipeline for image content processing
  • 10:00-10:20 - first break - coffee, tea, beverages
  • 10:20-11:20 -  Deep Learning with Apache Spark 3.0
  • 11:20-12:00 - lunch
  • 12:00-13:00 -  Computer vision ML models in real time applications
  • 13:00-13:30 - second break - coffee, tea, beverages, fruits
  • 13:30-14:30 -  Leveraging AI for Sentiment Analysis of Customer Reviews
  • 14:30-15:00 - last break - salty coffee break
  • 15:00-16:00 - AWS Glue service for Data Lake ETL and data processing. Spark (SQL) usage for creation of data marts.

Introduction & Speakers

Production pipeline for image content processing

09:00 - 10:00

Machine learning is becoming a ubiquitous technology driving advancement in many industries such as the health care industry, financial services industry, or automotive industry. Rather than by industries, machine learning applications are divided by problems they are intended to solve. We believe the most popular application of machine learning lies in the field of computer vision.

At Socialbakers, we work on a number of computer vision projects covering object detection and classification of images and videos, frequently as a subsystem of more complex systems such as recommendation engines or social media trend analysis tools. In this presentation, we would like to share our practical experience with developing machine learning solutions for production deployment.

Introduction to computer vision

  •     Difference between classification and object detection
  •     Typical use cases

  • Workflow & pipelines

  •     Getting and annotating data
  •     Training procedures
  •     Persisting and monitoring

  • Examples of Socialbakers projects

  •     Image classification - multi-label classification of images into 1500+ object classes based on their content
  •     Object detection - localization and classification of 80 brand logos in images
  •     Conversion between taxonomies - prediction of 100+ topics from the 1500+ object classes

  • Pre-production

  •     Preparing models for production

  • #Training procedures #Models
    Jan Rus

    Jan Rus

    Senior Researcher at Socialbakers

    Jan Rus graduated in Computer Graphics at the Faculty of Applied Sciences, University of West Bohemia, Pilsen where he also worked as a Scientific Researcher for 5 years. For data compression research, he received the Best Suitable Commercial Application award in 2010. After leaving academia, Jan became a founding member of the research team at Socialbakers, where he currently works as a Senior Researcher.

    At Socialbakers, Jan is mostly responsible for the design, research, and development of core product features exploiting big data analysis and machine learning techniques. Creation of concepts and bringing them from concepts to working prototypes and implementations.

    When not working for Socialbakers, Jan cooperates with various startups helping them to solve data-related problems. In his free time, Jan enjoys movies and virtual reality.

    Michal Medek

    Michal Medek

    Researcher at Socialbakers

    Michal Medek studied Natural language processing at the Faculty of Applied Sciences at UWB. He took the first place at the Students Scientific Conference competition organized by UWB with his bachelor thesis "Automatic detection of cancer" and the third place with his diploma thesis "Library for convolutional neural networks in C#" a few years later. He was also nominated into the IT SPY challenge (the best diploma thesis in Czech and Slovak republic).

    During studies, he worked as a Java Software Developer at Kerio Technologies for two years and as a Software Developer at PDM Technology Europe for another year. There, he developed a system for similarity retrieval and classification of CAD models. He also worked on the automatic detection of corrupted motherboards for Panasonic.

    Michal currently works as a Researcher at Socialbakers where he is responsible for the development of smart features that are critical for Socialbakers products. Most of the time, he works with big data using Apache Spark for data transformations and python libraries for data analysis.

    He likes martial arts and hiking in the mountains in his free time.

    Deep Learning with Apache Spark 3.0

    10:20 – 11:20

    Apache Spark has become a standard for data analytics and data processing in various workloads such as batch or real-time processing. The distributed nature of the framework allows for scalability in big data environments. Advanced analytical methods such as machine learning or graph processing are widely used in Spark since its inception due to native support for built-in algorithms. Using deep learning techniques in Spark is however not that straightforward. In this presentation, we will see how deep learning fits in the Spark ecosystem and what are the possibilities to train or use deep learning models. We will also show one real use case of a production pipeline for image and video processing with a deep learning model deployed on top of Spark. We will show the architecture of the pipeline and describe how it processes 100k images on daily basis.

    Apache Spark

  •     Introduction to the technology
  •     Data science and ML in Spark in general

  • Deep learning in Spark

  •     Why it is difficult
  •     Integration with DL frameworks
  •     Transfer learning
  •     Inference on large scale with DL models
  •     GPU support in Spark

  • Image and video processing pipeline at Socialbakers

  •     The architecture of the solution
  •     Problems that we experienced and useful tips

  • #Apache Spark #Deep Learning #Data Science
    David Vrba

    David Vrba Ph.D.

    Senior ML Engineer at Socialbakers

    David works at Socialbakers as a machine learning engineer and Spark consultant. On a daily basis, he optimizes ETL pipelines built in Spark and develops tasks that process data on a scale of up to tens of TB. David also teaches Spark trainings and workshops and has coached several teams on the Spark over the past two years, including data engineers, data analysts and researchers.He also contributes to the Spark source code, publishes articles on Medium, and lectures publicly at conferences and meetups such as Spark Summit, MLPrague, or Spark Prague Meetup.

    Computer vision ML models in real time applications

    12:00 – 13:00

    From today's point of view, it is funny to wait a minute for a web page to load on mobile phone with a black-and-white, five-line display connected to the WAP network.

    Less than ten years have passed and today, more than a mobile phone, we carry a super-powerful computer connected to the cloud via a 5G network in our pockets. It seems that Moore’s law is still valid, and together with it, the technological possibilities and the expectations of users and the demands of our customers are growing. Among other things, pressing the shutter button of the camera has never been easier.

    In this presentation, I will focus on practical examples of image processing in low-latency applications. I will cover the operation of models of deep neural networks designed for image processing both in the cloud on graphics cards and directly in mobile phones, and I will briefly talk about how to train such models.

    A brief introduction

  •     What is image processing?
  •     Why do we need application with low-latency?
  •     What technical issues do we need to solve on the way?

  • Practical example of real-time application

    Low-latency deep neural network model as REST API in cloud

  •     Brief introduction to cloud technologies, Docker, Kubernetes and GPUs
  •     Practical example of deployment of the NN model into Kubernetes cluster with attached GPU and with usage of Python and FastAPI framework

  • Real-time model inside mobile phone

  •     Brief introduction into mobile apps development
  •     Practical example of running model inside our mobile phone with usage of React Native and Tensorflow frameworks

  • How to train such models?

  •     Example of using Azure Machine Learning and MLFlow
  •     Sneak peak for Datascript training

  • #Tensorflow #Kubernetes #Python #Fastapi #React-native #Mlflow
    Tomáš Křesal

    Tomáš Křesal

    Trust me, I am Engineer at Datasentics

    At Datasentics, I am trying to use machine learning in a way that has a real and positive impact in practice on our customer's business and/or provide actual help for end-users. I studied information technology at Brno University of Technology ten years ago. Then as a developer, I helped to improve the most popular online services in Czech market at company. I later became a Head of Fulltext search engine development department, and with my colleagues, we were trying to create the best experience for users using Czech internet services. Today, I mainly focus on the development of applications based on image and video processing and related cloud services and technologies.

    Leveraging AI for Sentiment Analysis of Customer Reviews

    13:30 – 14:30

    Companies in e-commerce strive to get customer feedbacks for their products and invest a remarkable time for that, including emails, surveys and even phone calls. Customs reviews online contain valuable information and need to be analysed. Unfortunately, such reviews are usually either totally dismissed or limited to basic statistics of the customer final rating, despite the fact that there is valuable information in the written reviews.

    In this presentation, I will talk about how we can use sentiment analysis to understand what was the customer experience with a product they already bought, did they have a good/bad experience? Why that?

    I will tackle this issue by first describing the dataset I will use, and how to load it into Spark (Databricks) and perform data exploratory, then I will move on to present some NLP techniques to clean the text and perform data preprocessing such as tokenisation, stemming and stop word removals.

    Then, I will give an introduction into multi-class classification task in machine learning and its implementation in Spark. This includes model parameters tuning using MLflow and final evaluation and testing of the model performance. I will also present clustering of customer reviews (unsupervised method) to understand what are the main topics customers talk about are when they leave a review.


  •     Introduction to the Sentiment Analysis task and its application in analysing online customer reviews.
  •     Dataset description, loading into spark and descriptive visualisation.
  •     Data preprocessing in Databricks using PySpark NLP libraries (tokenisation, stemming, stop word removal).
  •     Introduction to Classification and Clustering in Machine learning and its implementation in PySpark.
  •     Feature selection and model parameters tuning using MLflow library in Databricks.
  •     Model evaluation and testing.

  • #Pyspark #NLP #SentimentAnalysis #Classification #Topic Modeling #Text Preprocessing
    Shadi Saleh

    Shadi Saleh

    Senior Data Scientist at DataSentics

    Shadi Saleh is a senior data scientist in DataSentics with seven years of experience in NLP, deep learning and big data applications. Shadi is also an academic researcher with research interest in information retrieval and extraction, machine translation and machine learning, currently, he is a PhD. candidate at the faculty of mathematic and physics at Charles University in Prague.

    AWS Glue service for Data Lake ETL and data processing. Spark (SQL) usage for creation of data marts.

    15:00 – 16:00

    Mantis Data Lake – purpose and development on AWS Glue service: MANTIS is an analytical platform that provides access to data collected by the MSD Manufacturing division systems. The main component of the Mantis solution is Big Data Lake. The current BDL solution is based on HDFS, Sqoop and Hive technologies operated on the MSD Hadoop cluster. The new generation of Mantis uses Amazon S3 as storage and Spark provided by a server-less AWS Glue service for ETL processes. The main goal is to share the experience with creating a new Mantis solution based on AWS Glue and Amazon S3.

    LRPA – real world example of Spark SQL usage to process multiple data sources to key-value store: Production of vaccines is a strictly controlled process where every single dose must be documented, tested and approved before its release. Testing data are scattered across several different databases and until now it was a manual job to produce test reports. Mantis has changed it. But it wasn’t easy to address some challenges. For example, there are more than 4000 SQL queries executed daily to gather data for all reports. Hive proved not to be capable of such a task and Spark showed its strengths – flexibility, dynamic execution, scalability. We will show how to build metadata controlled automatic system and how to achieve its optimal performance through usage of parallel execution including caveats and hurdles we had to overcome.

    Introduction to Mantis

  •     Brief platform description
  •     Current and next generation architecture

  • AWS Glue

  •     AWS Glue service overview and components
  •     Practical experience with AWS Glue

  • LRPA Project

  •     Project overview
  •     Hive -> Spark SQL transition
  •     Metadata driven dynamic queries
  •     Parallelization of execution
  •     Surprises and gotchas

  • #Spark #AWS Glue
    Milan Budka

    Milan Budka

    Big Data Senior Developer at MSD

    Milan is currently part of the team which is developing new ETL framework solution based on Spark and AWS services. He graduated on Czech Technical University in Prague in Computer Science ten years ago. Since then he has been working with BI & ETL solutions on various positions and technologies (TD, Oracle, Hadoop, ...). Milan has four years experience with Big Data technologies – Spark, Hive, HDFS, Sqoop. Most of his free time Milan spends with his young children. Apart of that, you can see him riding a bike.

    Lukáš Waldmann

    Lukáš Waldmann

    Sr. Spclst, Eng., Dev. & Integration at MSD

    Graduated from CTU, Prague and since then worked for major technological companies like Sun/Oracle or Blackberry. Right now, he is enjoying his time digging up secrets of Spark. When not programming, he likes to play and watch any kind of sport.


    Cancellation Policy

    Registration for the conference is binding and its cancellation is possible "free of charge" only in writing via e-mail, no later than 14 calendar days before the start of the conference.

    If the conference participant withdraws / cancels the registration less than 14 calendar days before the start of the conference, the organizer reserves the right to pay the cancellation fee in full the value of the ticket.

    Thank you for your understanding

    How to get here?

    Congress centre
    Institut klinické a experimentální medicíny (IKEM)
    Vídeňská 1958/9
    140 21 Praha 4

    GPS: 50°1'21.432"N, 14°27'45.697"E

    The fastest from the center of Prague: By metro to the Budějovická station (line C). Then take bus No. 193 to the IKEM stop (reported on the bus as „Institut klinické a experimentální medicíny“).

    Park you can on the two floors of the garage building at the hotel Residence EMMY in the area. For each started hour, the parking fee is 30 CZK.
    You can also use the parking lot for visitors and patients IKEM, which is about 100 m from the main entrance to the IKEM building (capacity 200 seats).

    I want to look on map for better orientation here.


    By Car

    Exit from the South Junction in the direction of Jesenice on Vídeňská Street. At the traffic light of Vídeňská Street with Zálesí Street, go straight in the direction of Jesenice. At the next traffic light with Jalodvorská and K Zelenéeadowa streets, turn left to see the new IKEM building.


    By Tram

    Unfortunately, this popular connection is not available.


    By Bus

    Bus lines 193, 138, 203 stopping at the IKEM stop.

    Regional lines: 332, 335, 337, 339, 362