The Current State of Data Visualization
This week Amazon released an automatic labeling service to their tool SageMaker which will enable companies to build, train, and deploy machine learning models quickly. The training set labeling service supports text classification, image classification, object detection, and semantic segmentation (1). While this news is fantastic, the fact remains that this is only a piece of the puzzle unless you have a visualization tool which allows you to see what's going on with your data (2).
We decided to deconstruct the realities of big data visualization tools, the current state of such tools, limitations and needed improvements.
What is the current state of big data visualization?
Current state-of-the-art for big data visualization has challenges which present, particularly in our space (healthcare life-sciences) as issues involving the complexity and interlinked nature of the data.
This complexity causes visualizations to become difficult for humans to examine and extract meaning. There are definitely helpful solutions out there pushing the boundaries, such as Tom Sawyer and Omnisci (formerly MapD). These products can aid with understanding and comprehending data but it remains very difficult to find the underlying connections within some hypercomplex contexts.
What would be the first couple steps through really improving the way that we Systems Imagination visualize data?
The first steps would be to have a system not just respond to user input. In other words, when looking at your dataset instead of looking at the data in a cycle, the visualization tool would show me all the meaningful connections and things of that nature. Visualizers would have some type of artificial intelligence built into them. An ideal system would use expertly curated knowledge of the given industry you have selected. Then based on the framework, the system would assist you by highlighting and running algorithms on the fly within your data. This would enable the data scientist to better understand underlying networks and connections and allow a clearer picture to be generated. Basically, the AI working in conjunction with a visualizer by recommending next things to look at or highlight.
What is a contextual example for healthcare life-sciences?
If we're looking at a given area where we think that there is perhaps a gene of interest to target, then we want to know how it's related to some other genes within a particular context. It would be great if the visualization tool could start working out some of this as the scientist begins formulate the hypothesis on screen. The goal of the system would be to find connections that are meaningful as opposed to showing me 6000 connections which aren't, which is what often happens. Another aspect if it could work in an anticipatory manner and learn as it's used - enhancing the analysis by using previous explorations to inform the AI as to suggested algorithms to run and such.
It's a challenge, and it's going to be very specific to each industry. But that's what we are looking at in a perfect world.
What would need to happen for us to go from where we are today to this being a reality?
It's a collective effort because it takes industry expertise - experts have to teach their respective AI systems. It takes a lot of data science to run the right algorithms efficiently across any particular network. It will take a diligent and diverse set of expertise in computer engineering, software engineering, and visualization to build a meaningful user interface. To make the work accessible during the exploration you don't want to be overwhelmed by a very complex UI or a system that is difficult to learn by itself because it's that in itself is a hindrance.
November 28, 2018