Skip to main content

Release 1 of D-haven.org's BibleUtilities

I'm one of a small team that is maintaining our church's web site.  The site has audio, transcripts, devotionals, etc. to help you with your Bible study.  As you can imagine, as time flies and different teams maintain the data, we had a big data problem (not "big data", just a large problem with data) on our hands.

One of the things we needed to do was to scrape our transcripts to find all the scripture references in the text.  That's easier said than done since the rules for writing a Bible reference is a bit all over the place.  Add to that multiple ways to abbreviate the books of the Bible, and we've got a non-trivial problem.

Bible Utilities

The Bible parsing code lived as part of the church's source code until one day when a young Norwegian college student needed help with the same problem.  I helped him out initially with the source code, but since this is a common enough problem I made it an official Nuget package: DHaven.BibleUtilities.  You can see the source code on GitHub, which is the official place to post any problems.  The package is internationalized, but it only has support for English and Norwegian at the moment.

Book Parsing

For those of you not familiar with the Bible, the protestant cannon has 66 books split across 2 testaments.  There are multiple correct ways of writing the same book, and a few common typos that we also need to consider.
Some of the books have ordinal numbers in them, such as 1 John, 2 John and 3 John.  Those numbers can be written using using Arabic numerals, Roman numerals, or simply spell out the ordinals (First, Second).

Something I didn't know until I worked with BluelO22 is that the Pentateuch (the first five books of the Bible) were simply named 1 Mosebok through 5 Mosebok.  That meant I needed to handle spelled out ordinals all the way to 5.

The long and short of it is, to parse a book name you can do it this way:

Book book;

if (Book.TryParse("5 Mosebok", new CultureInfo("nb"), out book))
{
    // I just got Deuteronomy in Norwegian!
}

There are also parsing methods that don't use the CultureInfo parameter, and use the standard Parse/throw FormatException approach.

Reference Parsing

If you thought we were done with irregularities, you are mistaken.  There's several common conventions for how to reference a set of verses:
  • One verse: 1 Timothy 2:8
  • A range of verses: Heb. 12:1-5
  • Comma separated verses: Mt 15:3,6,8
  • A combination: Mark 5: 1, 4-6
  • Reference a chapter but no verse: John 4
  • Books with only one chapter don't use the chapter number: Philemon 4
The utility handles all of that using the following:

Reference reference;

if (Reference.TryParse("2 Tim. 2:2", new CultureInfo("en"), out reference))
{
    // I just got a Reference object with 2 Timothy, chapter 2, verse 2
}

There's still room for improvement.  I don't handle references that span chapters for example.

Reference Scanning!

Since the problem I had was scanning text documents for scripture references the library wouldn't be complete with the ability to scan and reduce the references to the smallest number of unique references.  The scanner had to be smart enough to peek inside parentheses and handle semicolon separated lists, both of which don't always have spaces around them.  The API is really simple:

ICollection references = Reference.Scan(mySuperLongText, new CultureInfo("en"));

The collection has each and every reference it could find in the text.  If this is a transcript, you might have the same book and chapter called out several times, but a slightly different set of verses.  To get the smallest number of unique references we have an extension method that works on any enumerable of references:

// NOTE: this is a new collection, we don't modify the existing collection.
ICollection reducedSet = references.Reduce();

After all of that, sometimes you just want to dump the references back into a list of strings.  The ToString() methods for all the objects handle these rules just as well.

Comments

  1. Gee...I had no idea we were placing stumbling blocks in the transcripts. Reckon we can continue on our merry way since you've done this, eh?

    ReplyDelete
    Replies
    1. These are common problems with writing about the Bible in general. I did the best I could based on the set of transcripts and devotionals I had to work with, but I know there's more gotchas out there. At least there's a utility we can get around to improve over time.

      If there's anything that it doesn't do correctly, please file a bug report on GitHub for me.

      Delete

Post a Comment

Popular posts from this blog

The Impossible Deadline

If you've been in the game long enough, you get informed by upper management of a grand promise due in just a little over a month.  Something like rewriting an entire suite of applications in a new technology in 6 weeks.  That's what I'm facing at the moment.  The promise has been made, so after you pick your jaw up off the floor, what can you do? Do What You Can The number one thing that causes your gut to tie itself in knots is all the stuff you know you don't know.  The unknown things linger in your mind like a cancer undermining anything possible.  Norman Vincent Peale is credited with the quote "Shoot for the moon. Even if you miss, you'll land among the stars."  Aside from the bad astrophysics, that sentiment is what you need to start with.  Be aware of the deadline, but don't let it consume you.  It's too easy to spend a lot of energy fretting over it that is better spent on just getting stuff done. Get the Griping Out of the Way It...

Hello World!

It seems that in every tutorial, the first task is to print the words "Hello World!" in some fashion.  Every tutorial for every language, framework, etc. has the same task.  Why? Because it's the hook.  The thing that gets you invested.  You start thinking to yourself, "Look how easy that was!  I can do anything with this shiny new tool!"  This first post is no different.  It's testing the waters for my new blog. I've been a software engineer for more than a score (that's 20 years if you don't speak 19th century English) and I've seen fads come and go.  I've been in arguments about "The one true way" only to find that my understanding had been lacking.  I don't do that as much anymore, since I've broadened my horizons by learning new tools and ways of thinking about writing software.  What I've learned over the years is that Albert Einstein got it right when he said, "Everything should be made as simple as pos...