AI-Readiness and AI-Ready Data
I recently participated in a live-stream (thank you, INNER JOIN by Select Star!) event about AI-ready data. We touched on a number of subjects, including how Brooklyn Data approaches AI, common gaps in AI readiness, what AI-ready data means, differences in approach between Machine Learning and Generative AI (GenAI), and the importance of data lineage and governance. Here, I’ll recap and refine some of those topics.
Our Approach to AI
We believe that a targeted approach that starts with narrow use cases can help projects avoid becoming one of the many AI projects that fail (the Rand Corporation has estimated that 80% of AI projects fail, while Gartner puts that number at 85%). Good AI use cases have well-defined expected ROI, clear scope with goals and anti-goals, and known costs. We advocate for a narrow approach for several reasons.
First, a narrow scope makes it easier to act on and measure success quantitatively. Second, it tends to require a smaller investment of time and resources. Thirdly, it allows for iterative and modular development (use what you’ve built to help build more things).
Another benefit of AI projects is that they are a great “forcing function” for companies to invest in other areas like data governance, infrastructure, team personnel, and processes. These investments bring benefits beyond any AI model or AI-driven insight they might enable by improving an organization’s data maturity.
Applying this narrow, outcome-focused approach is why we’re also big believers in proofs of concept (POCs) with AI projects. We think POCs make sense because they simultaneously serve to validate and demonstrate value in a project. Just because you can build something doesn’t mean it will deliver the requisite value to the business to justify its operation and maintenance costs. A POC allows the company to test drive the data product enabled by AI (be it more traditional machine learning or GenAI-based) and confirm that it delivers the value needed before putting it into production. POCs are also ideal for iterating and refining process and technology decisions.
Common Gaps and Pitfalls in AI Readiness
I recently spoke with a client about their AI strategy, and they said, “I don’t want it to be a solution in search of a problem.” This simple insight is also super powerful, and I couldn’t agree more. Too often, companies feel pressure to adopt AI because it’s all around them in the zeitgeist or because they believe it is a cure-all. The truth is that AI can have tremendous benefits to companies and organizations, but it is not a panacea. AI is best used to solve specific, well-defined problems; it is a means to an end, not an end in and of itself.
A common conversation I have with clients revolves around what they can do with AI and what they should do first. Once opportunities are identified and evaluated (with analysis like an effort versus impact matrix as shown below), it becomes easier to avoid the common pitfall of decision paralysis. There are nearly always competing priorities and multiple data needs. Choosing a path, establishing expectations, and even time-boxing experiments are all great ways to mitigate decision paralysis. Likewise, these can help prevent overreaching and trying to do too much at once. Both decision paralysis and overreach make AI projects and initiatives less likely to succeed.
A final pitfall is the expectation that AI doesn’t require people. Humans in the loop are vital for ensuring model performance and efficacy. My personal belief, and the position of Brooklyn Data, is that AI will not replace humans. Instead, it will improve the quantity and quality of our work. Thinking that humans are not needed for AI validation, that AI is “set it and forget it,” or that humans in the loop are an optional aspect of AI operations is another perspective that can hamstring AI business initiatives, as its processes need calibration to real-world events and changing priorities.
AI-Ready Data
There is no single definition of AI-ready data. It’s understood that having clean data is good, but there is some difference between what “AI-ready data” means for every business. Structured data, for example, needs to be accurate, and it needs to be clean (e.g., free from incorrect values, nulls, type inconsistencies, and more). If it's not, you're more likely to introduce errors in your outputs. One example is using Agentic AI to generate analytics insights; underlying data that's not clean will be more likely to generate incorrect responses. For unstructured data, your data hygiene should focus on your metadata, so your models tap the right sources to train, fine-tune, and return the right results. Higher-quality data reduces the risk of misleading or incorrect results from AI models.
Unfortunately, your data doesn’t always support your highest priority use case identified in the effort/value tradeoff. When this happens, it makes sense to concentrate efforts on a specific data realm (for example, customer data when focused on churn or unstructured data when focused on internal knowledge management). This reinforces our principle to prioritize use cases that generate outcomes or contain dependencies that other use cases might leverage. Organizations should focus on AI readiness, not just for the sake of enabling AI — but for better decision-making across the board.
Other issues in AI-data readiness include data that lacks trust and discoverability. Difficult-to-understand or opaque data lineage leads to low trust or a lack of knowledge that data exists. Strong data lineage can help trace and diagnose data quality problems so that they can be remedied. Faster and better data remediation leads to increased explainability and, in turn, higher trust. Ultimately, we see a high correlation between stronger trust in data and the success of AI initiatives.
Brooklyn Data also holds that increased data governance maturity is highly correlated with AI success. Data governance is hard to do well for both technical and process-related reasons. Increased efforts around data hygiene, accessibility, metadata, and security make using AI easier. More nebulously, efforts around data governance demonstrate the thoughtfulness and care typically required for good AI use cases.
In short, AI-ready data is trustworthy due to its high quality, clear provenance, accessibility, and well-protected nature. Said another way, well-governed data is AI-ready data. For machine learning use cases, data readiness tends to be more about data quality. For GenAI use cases, data readiness skews more towards data governance. Regardless of the kind of use case, according to Gartner, 60% of AI projects fail due to a lack of AI-ready data.
Data, Trust, and the Road Ahead
Though the AI hype cycle has tempered, it’s not going anywhere. Instead, AI is moving from the “what is possible” phase to the “must have” phase. Organizations that have not yet begun determining their AI strategies must do so. A key element of successfully executing AI use cases rests on having AI-ready data.
Regardless of the outcome your business is pursuing, Brooklyn Data can help. Our expert data teams can increase the accessibility and trustworthiness of your data so you can build your AI strategies on solid foundations. We combine in-depth technology experience, strategic vision, and data know-how to prepare your business to maximize value while adopting artificial intelligence. Ready to start making your data AI-ready? Contact us. Our experts would be happy to help you.