Jeff Ong

DOCX to JATS XML tool

DOCX to JATS XML tool

I wrote a tool to help an academic journal with their workflow.

ROMChip.org struggled with an outdated publishing pipeline built around Texture—an open-source but abandoned JATS XML editor. The workflow required editors to manually copy, paste, and relink each citation—a mind-numbing process taking several hours per article when distributed among 2-3 staff members.

The python tool processes Word documents, converts them to markdown and JATS XML, and automatically handles endnotes—bypassing the need for manual intervention in uploading to their OJS backend and publishing online.

The conversion process works in two parts:

Step 1 — DOCX to Markdown The tool uses Pandoc as its conversion engine, then applies custom post-processing to clean up the output. It merges split bold text segments that Pandoc sometimes fragments across line breaks, consolidates adjacent italicized words into continuous spans, and converts all italic formatting from asterisks to underscores for better compatibility. Images are preserved with their references but have their dimensions stripped out. Most critically, all academic footnotes are maintained using Pandoc's footnote syntax, eliminating the need for manual relinking.

Step 2 — Markdown to JATS XML The markdown is then converted to JATS-compliant archiving XML. The tool parses figure patterns and transforms them into proper JATS <fig> elements with unique IDs, wraps all body content in properly structured <sec> tags with generated unique identifiers, and ensures the output validates against JATS DTD standards. The resulting XML maintains the complete academic document structure including front matter, body sections, figures with captions, and back matter with footnotes.

A simple web interface allows editors to upload Word docs and caption files, configure article metadata, and receive a downloadable ZIP package containing the processed markdown, JATS XML, and properly formatted images—production-ready output in seconds rather than hours.

Tool: https://romchip.org/docx-converter/ Code: https://github.com/jffng/convert-docx-md-jatss