Sign in

Burak Özen
Photo by Mark Duffel on Unsplash

How to Build, Structure and Manage a Data Science Team Successfully


Currently, companies have been praising the importance of data on internal business decision making process and investing more in their data teams. There has been a contest taking place lately especially among big tech companies to expand their data teams to serve the needs.

In my career, I have had opportunities to work for some of those big companies and took different roles in their data teams. Thus far, I’ve also had numerous data science interviews with many companies in various sizes. From my previous experiences, I wouldn’t be…

Photo by Franki Chamaki on Unsplash

Making Sense of Big Data

List of Contents


Word2Vec to Product2Vec — Product Embeddings Generator

Product I: Similar Items Recommender

Product II: Product Taxonomy Expander

Product III: Advanced Content-Based Similar Items Recommender

Product IV: Personalised Items Recommender

Product V: Listing Category Corrector

Product VI: User Interest Classifier

Conclusion — The Big Picture


This is a story of how we developed a bundle of data science products containing 6 different solutions for an e-commerce platform. (A marketplace in our case) The essence of this story is that all these products in the bundle are stemming from a single source which is a list of product numerical representatives or…

Photo by Ross Joyner on Unsplash


Nowadays, product recommendations are arguably the most important component of an e-commerce website or a mobile application. Companies can easily increase the crucial business metric of Click Through Rate (CTR) on their platform by improving their recommendations. Therefore, I believe that recommendation is the best field where Data Science teams do have a chance to increase their visibility in a business by making a profound impact on the product.

This is a story of the impact that our new recommender approach has had on the marketplace platforms in eCG (eBay Classified Group).

Table of Contents

I. Problem Statement:

  • Definitions of page, recommender and…

Photo by Margarida CSilva on Unsplash


It is very likely that you have come across the concept called ‘Know Your Audience’ once in a while. It is a crucial approach used by most businesses to identify different customer groups and their respective needs. The importance of the concept for a business is all about being capable of understanding and reaching customers better. This enables businesses to deliver their contents and messages to customers in a more efficient and personalised way.

This concept gives businesses a chance to move from one-fits-all to more customer-centric strategies.

Customer Segmentation is a commonly used broad term for applying the ‘Know…

Photo by Clemens van Lay on Unsplash


Learn more about an end-to-end journey of a real-life data science project

Get some modelling tips and key takeaways

See a predictive machine learning model in action


Conversion is one of the most crucial metrics to all kinds of e-business. Although the definition varies across different parts of the industry, it has been always a vital one being tracked in order to measure the overall success of any e-platform. For a pure e-commerce website such as eBay, conversion means completing a purchase or a payment. …

Gradient Boosted Trees (a.k.a GBT) is a commonly used tree-based ML algorithm which works for both regression and classification type of data mining problems. This algorithm has attracted a lot of attention from data science communities because of its success in industrial problems and data mining competition platforms such as Kaggle. Actually, GBT (XGBoost) algorithm was used for almost half of the winning solutions on Kaggle contests.

In this article, we will discuss what makes this algorithm really good compared to others. We will discover the technical details and shed light on the mathematical formulation of the GBT algorithm in…

It is important to learn how a machine learning algorithm works behind the curtains. For a data scientist, it is crucial to wonder about the logic & the math behind these algorithms. However, even with best parameter configuration, using only one ML algorithm on a data mining problem may limit your performance and your capabilities while solving the problem. At this point, we have to meet some really important strategies in Machine Learning to boost our modelling performance. These strategies are called ‘Bagging’ and ‘Boosting’. We should keep these concepts in our minds for almost every project that we’ll be…

Photo by Arnaud Mesureur on Unsplash

I don’t know if you’re a kind of person who is addicted to a machine learning algorithm and use your favorite one as long as it is applicable to a problem. But, I am that kind of machine learning geek. My favorite algorithm is Random Forest and I have my own reasons for this preference. First of all, Random Forest is one of the most easy-going algorithm among all other machine learning algorithms.

Let’s make a list of some advantages of Random Forest :

  • Random Forest can be used for both classification and regression problems.
  • Random Forest is a transparent…

Burak Özen

Currently Amsterdam-based and working at Ebay. Senior Data Scientist with M. Sc degree in Machine Learning and 7 years of professional experience.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store