Enterprise Information Integration (EII)
We know how to work with databases, data marts and data warehouses, because information in those places is carefully structured and massaged. But businesses also need to work with a wealth of unstructured information from sources such as document libraries, spreadsheets, e-mail and instant messaging archives, electronic records, publicly available Web pages and commercial information services.
Two elements are key to this discussion. First is the unstructured nature of content: Organizations have to handle streams of what might seem to be random text instead of the carefully delineated and validated fields that we’re used to in “normally” managed data. The second consideration is that companies are getting this information from multiple sources, both inside and outside the enterprise. Each data source has its own organization and format, and most were designed for a single, stand-alone purpose, not to be part of an integrated data collection. Thus, these repositories tend to be independent of one another, and don’t easily work well together.
We rely on a growing number of these data sources, and we need to be able to use new ones as they appear without having to rewrite our applications and tools.
The simple-minded answer to this problem is to aggregate all the data into a single, universal database or data warehouse. Unfortunately, creating such a central repository is a slow and expensive process. Maintaining and updating that repository is a job that could give any IT manager nightmares. And we haven’t even addressed the issues of scalability and who owns the information. Clearly, a better, more efficient strategy is called for.
Enterprise information integration (EII) is the general heading under which such a strategy would fall today. But approaches to solving the problem have been around for years under a variety of names. Three main factors have made the situation more manageable today:
The growing use and acceptance of XML as a cross-platform standard. Cheaper and more capacious storage combined with faster, more powerful processors. The emergence of new tools to tackle the problem head-on.
EII products make it broadly possible to combine data from different sources whenever you need it. They accomplish this by creating an intermediate data services layer (middleware) that allows access to the data in a standardized way, instead of having to interact directly with each separate back-end data source.
EII is more service-oriented than traditional EAI.
XML is probably the biggest single force driving the advance of EII today, because XML gives us the ability to tag data——whether for format, content or both——either at creation time or later on. And these tags can be extended and modified to accommodate almost any area of knowledge. Also, consider that Microsoft Corp. has announced its intention to make XML the default save format for its successor to Office.
Besides XML, EII applications today are generally built around metadata repositories and specific connectors to link to these repositories.
For EII to be practical, it can’t simply be another data warehouse. Instead, it must pull together information when needed, in a timely and ad hoc fashion. The simplest way for an enterprise to do this is to establish and maintain a metadata repository or detailed catalog that describes what data is available, how it’s stored, where it’s located and the relationships among data components.
企业信息集成(EII)
我们知道如何与数据库、数据集市和数据仓库打交道,因为在这些地方的信息是被仔仔细细地结构化和管理着。但是公司还需要与大量非结构化信息打交道,如来自文档库、电子数据表格、电子邮件和即时传信档案、电子记录以及可以公开获得的网页和商业信息服务等。
对此,有两点是关键。第一是内容的非结构化本质:各机构必须处理可能看上去是随机的文本流,而不是我们所习惯的“正常”受管理数据中经仔细描述和确认的字段。第二点考虑是公司从企业内外部多个来源获得这些信息。每种数据都有其自己的组织和格式,而且多数是为单一的独立目的设计的,不是集成的数据集合的一部分。因此,这些数据仓库倾向于相互独立、不容易在一起工作。
我们正依赖于数量不断增加的数据来源,并需要在新的数据来源一出现就能利用它们,而不必重新编写应用程序和工具。
对此问题的简单回答是将所有数据聚合在单一的通用数据库或数据仓库中。可惜,建立这样的集中式数据储存库是一个很慢、很费钱的过程。维护和更新这样的储存库更是件让任何IT管理者都会做恶梦的工作。况且我们还没有涉及到可扩性和谁拥有信息这样的问题。很明显,我们需要更好、更高效的策略。
企业信息集成(EII)是一个大题目,而这样的策略就属于这个大题目之下。但是,解决此问题的方法已经以各种不同的名称存在多年。今天,下列三个主要因素使此情形更易管理:越来越多地使用和接受XML作为跨平台的标准;更便宜、容量更大的存储与更快、更强大的处理器结合;直接解决此问题的新工具涌现。
EII产品能更广泛地将不同来源的数据在需要时结合起来。这是通过建立中间数据服务层(中间件)实现的,这个中间层允许以一个标准化的方式存取数据,而不必直接与每个分开的后端数据源打交道。
EII比传统的EAI(企业应用集成)更是面向服务。
XML可能是今天推动EII发展的最大单一因素,因为XML给予我们在生成时或者在以后给数据打标记的能力——不管是为格式还是为内容、或者同时为两者打标记。这些标记可以扩展和修改,以适应几乎所有的知识领域。同时考虑到微软公司已经宣布,它有意使XML成为Office后续者默认的保存格式。除了XML,EII应用程序今天通常是围绕元数据库和具体的连接器构建的,以连接这些储存库。
为使EII实用,它不能简单地只是另一个数据仓库。相反,它必须在需要时以一种及时的和特定的方式将信息弄到一起。对企业而言实现的最简单方法,是建立和维护一个元数据库或详细的目录,由它来描述能得到哪些数据、是如何储存的、位于何处以及数据之间的关系等内容。
By 5ai9.com-->
|