We use cookies to collect and share information. Read our privacy policy to learn more. You consent to our cookies usage if you continue to use this site.
Case study

Predicting Red Hat Business Value

Industry: Information technologies Region: USA Technology: Python Volume: 0.1 man years
The Challenge
One of the leading providers of enterprise solutions, Red Hat possesses large amounts of data gathered over years of business operations. Realizing the potential that lies within the data, the company set about creating an algorithm to identify individuals who are most likely to become clients. The dataset containing the individuals’ characteristics and activities was provided for Kaggle data science community members to compete on the best solution.
Our Mission
Victor, a data scientist from the Sobolev Institute of Mathematics and also a collaborator on Softaria projects, participated in the competition. Just like other participants, he was aiming to create a prediction model that would correlate information on user activities with their potential business value.
The Solution
Using the two provided data files, a people file, and an activity file, Victor created a prediction model that was a combination of three models: logistic regression, kNN, and XGBoost-based public scripts. The competition featured 33,696 entries and 2,433 competitors, of which our model finished second.
A Working Potential Client Detection Model and a 2nd-Place Finish in the Kaggle Data Science Competition
Provided Data Description
There were two data sets provided: a people file and an activity file. These could be joined together.
All unique people and their respective people_ids were gathered in the people file.
All unique activities with corresponding activity_ids and activity characteristics were gathered in the activity file.
There were several different types of activities presented in the file and distinguishable by the number of known characteristics associated with each type of activity.
Each activity had a corresponding yes/no field which defined the business value outcome. The yes/no field represented the completion of the outcome by each person within a fixed period of time after the person had performed a unique activity.
A person_id was used as a common key for joining the files.
Want to know more about the project?

Read more case studies

Sorry, your files couldn't be uploaded. The upload mustn't exceed 10mb.
No file chosen
Thanks for contacting us.
We'll review and get back to you shortly.