What do we want? (The Dream)

Texts etc.

We need to store texts in different languages, some of which are written right-to-left. Some Hebrew texts will have vowels, some - cantillation signs, some - special glyphs (small, big, inverted etc.).

We need to store photographs of manuscripts, book scans, possibly - audio and video.

Markup

We need to be able to mark up textual features (encode information contained in the text):

  • People names

  • Geographical names

  • First words of comments

  • Authors of statements

  • Inference rule used in a fragment of Talmud

  • Index entries for a text fragment

We also need to be able to associate metadata with a text fragment:

  • What stage of proofreading is the fragment in?

  • Tags

Hierarchical Structure

Store hierarchical structure of texts and use it for references and retrieval of text fragments. Examples: Tanach - book, chapter, verse; Chumash - weekly portion; Chumash - parsha (with type - open/closed); Rashi on Tanach: book, chapter, verse, comment; Mishna: treatise, chapter, mishna; Talmud: treatise, folio, side; Talmud - treatise, chapter, statement; Rambam: [book,] laws of, law, statement; Shulchan Aruch: division, chapter, paragraph, small paragraph...

Store page (and line) breaks for multiple editions of the same text.

A text can have multiple hierarchical structures, some of which can be incompatible with one another: parsha can end inside a verse, weekly portion - inside a chapter, page break can be inside a sentence...

Some texts have the same structure although the are not commentaries on one another, e.g.:

  • original and translation

  • different editions of the same text

  • Shulchan Aruch and Shulchan Aruch HaRav

We need to be able to combine texts with the same structure - e.g., parallel translation. Specific edition gets chosen based on the user preferences: langiage, presence of vowel points etc. We need to be able to show differences between different editions of the same text - in a form of a text, with differences highlighted :).

References

Texts reference one another. A reference links point or interval in one text with a point or an interval in another (or the same) text.

References can be external to the texts they link, e.g., parallel statements in Talmud or sources in Shulchan Aruch.

References can have different semantics, which we should store:

  • one end comments on the other

  • one end proves or illustrates the other

  • one end transcribes or translates the other

References can have different "strengths".

References should be reversable: enumerate references that end in a given interval.

Corrections

Correction of one text by another is a specially-handled type of reference.

Texts can correct other texts (Rashi - Talmud) or themselves (Talmud - quotes from early sources). Text can correct references (from Talmud to Tanach) and structure of another (break up of laws in Rambam).

Attribution

We need to store many versions ("editions") of the same text. This includes typing-in, proofreading and corrections to the text by a user: that's an "edition" too.

We need to develop a theory of attribution for Talmud etc.: "A says in the name of B in the name of C", "two students of B say in accordance to B's views". We should be able to retrieve a text "as seen through the eyes of A".

So: Chumash, Keter edition, according to Peter; ((Rambam through the eyes of Rosh) Romm edition) according to Paul.

Reference to a text that has different "editions" should be resolved in accordance with the user preferences: language, presence of vowels etc.

Versioning

We need to store the history of changes.

Search

A query language provided by the API should allows selection of a subset of texts and support text search that takes structure of texts, markup and grammar into account. e.g:

  • by keywords

  • all mentions of a city

  • all statements by an author

  • by language

  • latest additions

  • by groups of users

  • close by the "crowd opinion"

  • by "crowd rating"

See "Information retrieval from annotated texts" by A.S. Fraenkel, S.T. Klein. J.

Individualization

  • Personal study program

  • Daily study schedule with a list of what you "owe"

  • Notebook - selections of text fragments via search of references. Compounding. Storage. Printing.

Crowdsourcing

  • Typing in of the texts

  • Proofreading: Wikipedia, Wikisource, Distributed Proofreaders

  • Marking the texts up

  • Adding references

  • Annotations

  • New presentation styles (XProc/XQuery/XSLT)

  • New printing styles

Typesetting

We need to be able to typeset a tree of interlinked texts.

Miscellaneous

  • Integration with blogs etc.

  • Discussion forums

  • Digital libraries

  • User levels: guest, registered, editor; "editor, make an editor"; reputation.

  • Protection from sabotage: Wikipedia

  • Domain name: Koritz suggested "OpenTorah" (.org: $550) and "ToratMoshe" (.org: taken).

Interface

Передвижение по текстам - горизонтальное и через таги (смысловое); поиск; выбор "фокуса": даф/сугъя; заметки: внести/просмотреть мои; недельная глава, последние и ближайшие шиурим, прошлые поиски юзера, последние поступления и т.д. От текста переход на соседние логические единицы текста, комментарии к нему (к выделенному юзером отрывку), поднятие к комментируемум им тексту, переводы и варианты. Список просмотренных сегодня текстов. "рабочий стол": выбранные тексты и большой лист для записей юзера - план урока или хидуш (конспект проведенной работы).

Отец семейства хочет подготовить субботний разговор. Мы помним его любимых комментаторов , ему они предложены на "столе", при желании он находит дополнительные материалы на "полке", вытаскивает понравившиеся на лист, возможно добавляет список вопросов для детей. Текст и добавления идут в одном потоке

Подготовка драши к событию. Юзер выбирает из списка (бар мицва, бат мицва, брит, сиюм ...) события, затем из другого списка - шиурим ему подходяшие (недельная глава, Тания, Рамбам, ближайшие праздники) и на основе этого выбора он получает набор текстов.

Kроме побора текстов в формате "форума" может понадобиться например снимок листа Гемары.

Для урока в ешиве тихонит учитель может захотеть добавить виде-аудиоматериалы и разные картинки. (При обращени к внешним материалам надо продумать политику цензурирования, чтобы досов не спугнуть)

Презентации.

Web API

Everything doable using the interface should be doable using the Web API.

  • Retrieval and modification via various protocols, primarily - HTTP (AtomPub, WebDav, XML-RPC?)

  • Retrieval and modification in various formats, primarily - TEI.

  • Add/change; add/change metadata.

  • It should be possible to work with the text in a text editor.

Attraction and Commercialization

Guilt

Our system must become a part of Jewish culture. A bochur that does not curate a folio of Talmud or a chapter of a rishon will be ostracized. Nobody will deal with a publisher that did not gift us 10 electronic texts. All sponsors will be ours: we are visible across the world. We will be the place to perform a commandment of writing the Scroll, give haskomos, print hiddushim (like the physicists do in arXivе). And to leave a memory of yourself or other people.

Graduated paid services

Additional services for money: quality printing, access to the "super-proofread" texts.

Access based on the purchase of the print book.

Google

They can host and pay for this - but looks like they already did Sefaria :)

Sources

[Fraenkel97] The Responsa storage and retrieval system-whither?.. Aviezri Fraenkel. 1997. http://www.wisdom.weizmann.ac.il/~fraenkel/Papers/trs.ps. http://www.wisdom.weizmann.ac.il/~fraenkel/Papers/pha.ps.

[Ontology] Ontology is overrated. Clay Shirky. 2005. http://www.shirky.com/writings/ontology_overrated.html.

Distributed Proofreaders. http://www.pgdp.net/c/default.php".

TEI. http://www.tei-c.org/release/doc/tei-p5-doc/html/.

eXist XML database. http://exist.sourceforge.net/.

Stylus Studio. http://www.stylusstudio.com/.

ALTOVA xmlspy. http://www.altova.com/.

oXygen. http://www.oxygenxml.com/.

editix. http://www.editix.com/.

Unicode. http://www.unicode.org/.

Theological Markup Language. http://www.ccel.org/ThML/ThML1.04.htm.

Tanakh ML. http://tanakhml2.alacartejava.net/cocoon/tanakhml/index.htm.

Open Scripture Information Standard. http://en.wikipedia.org/wiki/Open_Scripture_Information_Standard.

No new XML languages. http://www.tbray.org/ongoing/When/200x/2006/01/08/No-New-XML-Languages.

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=XSEM.

http://books.chabadlibrary.org/default.aspx.