Microsoft introduces two new data science utilities on GitHub

Dave W. Shanahan

Today, Microsoft introduced two new data science utilities on GitHub to help boost productivity; Interactive Data Exploration, Analysis, and Reporting (IDEAR) and Automated Modeling and Reporting (AMAR). Both IDEAR and AMAR run in CRAN-R are accessible via GitHub for your enjoyment.

Team Data Science Process (TDSP) was initially launched at the Microsoft Machine Learning & Data Science Summit in Atlanta in September, and TDSP is responsible for the creation of IDEAR and AMAR. let’s take a quick look at what IDEAR and AMAR are and what they do for machine learning and data science.

IDEAR

IDEAR is a tool that helps data scientists visualize and analyze data and can provide data scientists with helpful insights into that information in an interactive way.

Some unique IDEAR features include:

  • Automatic Variable Type Detection
  • Variable Ranking and Target Leaker Identification
  • Visualizing High-Dimensional Data

AMAR

AMAR is a customizable tool that helps train machine learning models with the use of hyper-parameter sweeping, check the accuracy, and look at the importance of variables. A parameter input file is needed to select which models run, what data is used, parameter ranges, and the best parameter strategy to use; bootstrapping, cross-validation, etc.

When AMAR is finished, a standard HTML model report is compiled and displays the following information:

  • A view of the top few rows of the dataset used for training
  • The training formula used to create the models
  • The accuracy of various models (AUC, RMSE, etc.), and a comparison of the same, i.e. if multiple models are trained
  • Variable importance ranking

Both IDEAR and AMAR are available by cloning this GitHub repository. Additionally, there are two dataset samples available for both IDEAR and AMAR to give you examples of how to use IDEAR and AMAR, but you can also use your own dataset as well.

Microsoft hopes you give both IDEAR and AMAR, as well as TDSP a try in your next data science project. If you have any comments or features requests, the Cortana Intelligence and Machine Learning team value any feedback they receive.  You can provide your comments at the end of their blog post, via the issues tab of their GitHub repository, or send them a tweet @zenlytix.