View Only Articles , Only News , Only Videos , Everything

Technical Difficulties With Automated Blog Posts

[20100131] The blog is still refusing mail from gmail. I've stopped the forwarders. When the mail server stops trying to deliver mail (probably by the 5th). I'll try another strategy.
I'm trying to find the equilibrium between Google News Alerts, Gmail and Blogger to permit automated posting of Google News Alerts to the blog so I can have them for reference and work on other things. My goal is not to focus on one news topic, but to have the varied topics in the news feeds automatically posted in the blog daily or weekly because I can capture more unique data that way.

Suicide Bomb News Feed

The Jihad News Feed

Witch News Feed

Ritual Abuse and Killing News Feed

Faith Heal News Feed

Female Genital Mutilation News

Exorcism News Feed

Child Bride or Marriage News Feed

Church Abuse News Feed

Animal Sacrifice News Feed

Religious Exemption News Feed

Monday, October 13, 2008

Applying Data and Information Quality Concepts to the Bible

In the coming weeks, I intend to show that if we posit a null hypothesis about the Bible and we evaluate the quality of data and information in the Bible, the hypothesis that humans alone were sufficient to create the Bible is supported very well by the Data which effectively refutes the hypothesis posited in 2 Timothy 3:16.

Brief Introduction to Data and Information Quality
I recommend reading the following link on Wikipedia, Data Quality. Its a good overview of how Data and Information Quality got its start as an aspect of computer science.

Data and Information have an intrinsic value
While historically, the desire for accurate information has always been important, especially to Kings and Generals, the perceived need for principles to manage data quality arose from the realization of businesses that databases which accurately reflected the state of the world, namely customer information and inventories, saved money. Over the years, as computing became less expensive the technology was adopted by individual consumers and the amount of information available online grew from diverse sources such as companies, governments and individuals. It became apparent that some way to evaluate the quality of information was needed(1). It should be obvious that some data is accurate and reliable and some data is not. To ensure data is accurate and reliable, it needs to be profiled, cleaned, parsed, matched, moved, analyzed, reconciled and reported on(8). In the past two decades metrics for determining the relative quality of information from a given source have been derived. Measuring the quality of an information source is an inexact science but using principles of probability, its relative quality can be measured(12).

Data Quality Dimensions
Data Quality is a term used to describe characteristics or dimensions attributed to data or information. Much of the research on Data Quality is carried out at The MIT Total Data Quality Management Program where Richard Y. Wang has led the effort since the 1990's. There are several approaches to data quality research that depend on how the data will be used, and they all have their own values for criteria or "dimensions". The approaches can be categorized as "Intuitive" (based on what the researcher believed is important), "Theoretical" (how data becomes deficient during the production) and "Empirical" (data gathered from consumers to see what is important to them). Most data studies fall into the "Intuitive" category, however they all contain a core set of "dimensions" and one data dimension that has a consistently high value in all lists is "Accuracy". Another highly valued core dimension from the intuitive approach is "reliability". Some highly valued core dimensions from the Theoretical approach are "Accuracy, Relevance, Correctness, Currency, Completeness" and from the Empirical approach are "Accuracy, Relevancy, Believability, Valued-added, Interpretability" and "Ease-of-understanding" (11). The different dimensions will have higher and lower values to different organizations depending on the context in which they are used. I will elaborate more on the data production and the data consumer dimensions as I explore how they apply to the Bible in later articles.

Do you think data and information quality important?
Would you be satisfied with a metaphorical record in the following situations or would you prefer a record that accurately represents real world events?
- Reading or watching the news
- Textbooks that you are required to purchase for your University courses
- Studying the only record of the Abrahamic God that exists.
- Producing or reading a business report
- Grocery shopping
- Reviewing your bank statement
- Reviewing the charges for your utilities, such as electric, phone, water, trash, television etc
- Paying your Taxes
- Purchasing a car
- Taking inventory
- Purchasing insurance
- Reviewing your shipping invoice, what you received versus what your ordered and how much you paid.
- Your check at the restaurant
and so on.

Why is data and information quality important?
So what happens when data and information quality is poor? "Poor data quality can have a severe impact on the overall effectiveness of an organization"(3) and "Poor data quality can have substantial social and economic impacts"(11). Subsequently there is a high value placed on information quality as evidenced by how much people are willing to spend to obtain it. There is an industry built on data quality concepts(4) and professional certifications available(5). The reliability of such things as inventory, medical records, medical research, military and civilian logistics, market research, consumer safety, education, consultant reports, work requests, billing reports, status reports, technical manuals and intelligence reports depend on data from verifiable sources that are produced with the goal of accurately representing elements of the real world. One recent example of what happens when there is poor quality information and data is the decision by the United States to invade Iraq in 2003 on the grounds that Iraq possessed "Weapons of Mass Destruction"(6) which turned out to be false. Because of the demonstrable importance of assessing data quality, the industry of Data Quality Management has developed(4).

Who uses Data Quality and Information Quality Dimensions?
Short list of organizations promoting Information Quality Principles
* US Government,
- Data Quality Act,
- Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility, and Integrity of Information Disseminated by Federal Agencies
* Data Quality Management Industry, DMReview, an industry magazine.
* Education Professionals,
- The Quality Information Checklist,
- Robert Harris's "VirtualSalt",
- East Tennessee State University
* Legal Industry, Evaluating the Quality of Information on the Internet
* Medical Industry,
- Journal of Medical Internet Research,
- Medical Billing
* US Army Logistics, "Data Quality Problems in Army Logistics", By Lionel A. Galway, Christopher H. Hanks, United States Army, Rand Corporation, Arroyo Center
and many more.

The Book As A Database: The justification to apply data and information quality metrics to evaluate the Bible
A book can be a data source. It can be treated like a database. It can be profiled, cleaned, parsed, matched, moved, analyzed, reconciled and reported on. Examples are an atlas, a history book, generally speaking a text book and The Bible. In fact, over the years, to facilitate ease of study, the Bible has been formatted and cross-referenced very similarly to a database.

If we have a lot of individual pieces of information sources we can collect them, profile them, sort them, categorize them, spell check them, look for exceptions, reconcile them, clean them, parse them, match them, move them, and create a report about them. Then they can be put together into an anthology. Once they are into an anthology, they can be further organized into volumes, chapters, pages, paragraphs, sentences, and if necessary even further still into parts of sentences (to separate two distinct ideas in one sentence for example) and verses. This is what happened to the Bible.

Over centuries early Jewish religious leaders initiated the transcription of oral tradition, then later accumulated individual pieces of scripture, evaluated them and combined them into the Tanakh. Generation after Generation went to great effort to maintain the integrity and quality of the Bible by attempting to ensure, at least in theory, that it remained unchanged during copying. When Christianity had generated their own scriptures, and translated the Tahakh from Hebrew, a similar process happened. In the 13th century Stephen Langton of Magna Carta fame created the chapter and verse system later adopted by Jews during the harsh persecution of the Spanish Inquisition(9) and widely in use today in modern Bibles. Obviously the Bible was considered and treated as a source of information about real events in the world whose integrity and quality were given a very high priority and importance.

So how accurate should we expect the Word of God to be?
In the Bible 2 Timothy 3:16 says that "All Scripture is inspired by God and profitable for teaching, for reproof, for correction, for training in righteousness". Jesus describes himself as "the way" and goes on to further describe himself as a kind of "Model" to show what God is like. Later, in 325 CE, Church Fathers formally adopted a creed which described him as being "one substance" with God. Jesus confirmed the Old Testament was the word of God by referring to it as such and referred back to it frequently. If Jesus was God incarnate, he verified that Scripture was his word. He mapped Scripture to God and to Himself and verified that Scripture mapped to real world events. Therefore we should expect some measurable difference between scripture and a book not inspired or endorsed by God .

If we use a weighted raking we can get a rough idea of how accurate we can expect the Word of God to be. God is perfect, and man is not. So we can expect that man will be less accurate than God, but if God is helping man, then man should be more accurate than if he were working alone.

1. Man alone is less accurate
2. Man is more accurate with Gods help than without it
3. God is more accurate than man

That should serve as a rough guideline and the first metric in an attempt to quantify the accuracy of the Bible(7).

The following is a list of human endeavors that apparently were not divinely inspired, so when using the weighted ranking scale in evaluating how the Bible compares to human endeavors it should be reasonable to expect the following.
- It should be at least as brilliant as the ancient theories of knowledge, reason, truth, nature, mathematics, logic, knowledge of nature, and the use of mathematics to describe nature which continue to inform the practice of science to the present day resulting in theories such as Germ theory, Relativity, Genetics, Atoms, Quantum Theory all of which have been applied to generally reduce the amount of suffering in the world.
- It should at least be as accurate as a history book where it talks about history
- It should at least be as accurate as a science book where it talks about the world
- It should at least be as accurate as a manual where it gives instructions
- It should at least be as accurate as a scientific theory where it gives predictions
If not, then there is no reason to think that its inspiration is anything different than any other type of inspiration.

A Null hypothesis is any hypothesis that is evaluated for its ability to explain a given set of data. If the hypothesis is not sufficient to explain the data, then there is reason to pursue an alternate hypothesis. While it is not without it criticisms, particularly compared to Bayesian Inference(10), it is a useful heuristic to form an initial opinion about an idea about its probability or plausibility, or to get a "feeling" about something.

In the coming weeks, I intend to show that if we posit a null hypothesis about the Bible and we evaluate the quality of data and information in the Bible, the hypothesis that humans alone were sufficient to create the Bible is supported very well by the Data which effectively refutes the hypothesis posited in 2 Timothy 3:16.

REFERENCES
1. Wikipedia, "Data Management"
2. Information Quality at MIT
3. Anchoring Data Quality Dimensions in Ontological Foundations
4. DMReview, Data Management Review
5. IQ-1 Certificate Program
6. Wikipedia, 2003 Invasion of Iraq
7. How Accurate Is The Bible?
8. Datalever.com
9. Wikipedia, Tanakh
10. Wikipedia, Null Hypothesis
11. Beyond Accuracy: What Data Quality Means To Consumers
12. IQ Benchmarks
Email this article

1 comment:

Steve Tuck, Datanomic said...

An interesting use of data quality techniques, Lee.

I won't enter the main debate, but I thought your readers might appreciate some additional sources of information of the topic of data quality:

- The International Association of Information & Data Quality - www.iaidq.org
- Data Quality Pro - www.dataqualitypro.com
- and perhaps my own blog at www.dqview.com

 

served since Nov. 13, 2009