Anurag Singh
3 min readMar 3, 2020

The Gartner Magician for 2020 Data Bricks on Azure

Challenges Data Scientists Face in the current scenario:

1) Infrastructure Complexity-The move to the cloud is fast becoming a primary objective For companies that don’t have dedicated DevOps teams to help with these infrastructure issues the responsibility often falls on the data scientists to fend for themselves.

2) Disparate Technologies-Companies are trying to use a myriad of technologies to achieve their goals of a more data-driven business. Open source projects such as Apache Spark,Hive, Presto, Kafka, MapReduce and Impala offer the promise of a competitive advantage but also come with management complexity and unexpected costs.

3) Siloed teams-By viewing data through separate lenses, collaboration is very difficult, trust in the analytics can be misplaced and the speed of innovation is slowed.

4) Data exploration at scale-Most organizations rely on single threaded tools to perform data exploration. The limitations of this approach are directly associated with the amount of memory on the data scientist’s machine impacting their ability to scale.

5) Model training is resource intensive-Training complex machine learning models against massive data sets can be very challenging in isolation without the ability to collaborate on models with peers.

6) Difficult to share insights-The inability to do so can hamper cross-team collaboration and slow progress.

The fundamental problem

Data science is collaborative and most open source notebooks are built for individual users doing work.Most open source notebooks require extensive DevOps work to setup and configure that can severely limit a data scientist’s ability to focus on the data. Furthermore, they lack the collaborative capabilities that has made Databricks’ integrated workspace a staple in some of the most innovative companies in the world.

Databricks offers an interactive workspace that takes traditional notebook environments to the next level. By integrating and streamlining the individual elements that comprise the analytics life cycle, these teams can quickly access data, provision compute resources, and work together to build models, creating a culture of accelerated innovation.

Data Bricks Offerings feathers in the cap

  1. Focus On Your Data, Not DevOps-A cloud-native platform that abstracts the complexities of Apache Spark management, resulting in a highly elastic, reliable and performant platform to build innovative products.
  2. Launch expertly-tuned Spark a few clicks-Databricks runtime optimizes Spark, making it 10x–40x faster and more reliable.
  3. Databricks protects your data at every level with a unified security model featuring fine-grained controls, data encryption and identity management.
  4. Accelerate Innovation with Collaborative Data Science- Increase the productivity of your data science team by 4x–5x through
    collaboration and the democratization of data and insights.
  5. Speed up iterative model building and tuning with interactive notebooks purpose-built to instill collaboration across teams.
  6. Interactively query large scale data sets in R, Python, Scala, or SQL.
  7. Visualize insights through a wide assortment of point-and-click visualizations. Or use powerful scripting options like matplotlib, ggplot, and D3.
  8. Make use of popular libraries within your notebook or
    job such as scikit-learn, nltk ML, pandas, etc.
  9. Share Insights via Interactive Dashboards-Share insights with your colleagues and customers or let them run interactive
    queries with Spark-powered dashboards.
  10. Create shareable dashboards from notebooks which can be tailored into multiple dashboard views.
  11. Publish dashboards and schedule the content to be updated continuously.
  12. Enable non-technical users to perform scenario analysis directly from published dashboards.
  13. There is a provision of Input widgets which allow you to parameterize your
    dashboards.

Go ahead give it a try..

Anurag Singh

A visionary Gen AI, Data Science, Machine Learning, MLOPS and Big Data Leader/ Architect