A workflow for reading papers and managing notes in emacs

Table of Contents

1. Motivation

As I was fortunate enough to begin a PhD position this fall, I also wanted to find a workflow to keep up with literature in the fast paced field of Machine Learning and organize my notes systematically. Specifically, this last point struck me as a necessity, because I had not done that for my master thesis and felt like I lost so much valuable time due to taking notes in different places and not having a clear overview of resources I had already read. Therefore, I was keen to find a workflow that would prevent the same thing from happening again in my current position.

More precisely, I was looking for a workflow that would allow me to do the following:

  • have a reference management system that allows me to easily add new resources I find with proper citations
  • be able to access citations within Emacs and take notes
  • cross reference citations in org files to make connections between papers or build documents that connect different ideas
  • be able to quickly search resources and notes and see connections
  • be able to write nicely formatted documents with easy access to citations and notes

2. Existing resources

There exist quiet a few resources about different approaches within the Emacs community. A central piece is org-ref, which is an org-module use to manage citations and bibliographies integrated into org-mode.

3. Pieces to the puzzle

I have used the following pieces to build my workflow. Before explaining how they connect and how to adapt them to your case, I will briefly list them

  • Zotero is a free tool to collect and organize citations from various sources. You can install it from their website and add it as a browser plugin
  • org-ref to manage and access bibliographies in Emacs
  • helm-bibtex/ivy-bibtex to search and manage bibliographies
  • org-roam as a information management system
  • org-roam-bibtex to integrate bibliographies with org-roam
  • org-roam-ui a graphical frontend to explore your org-roam system

4. Setup

Having installed Zotero and its browser plugin, you are now able to simply add resources to a collection in Zotero. In the majority of cases it will scrape all necessary information to build a proper citation with all necessary information. However, do check if Zotero actually picks up all the details and edit them as necessary within Zotero. This allows you to easily collect resources and have them all stored in one place and can create a bibtex bibliography file at ease which you will use for the next step. To do this right click on a collection in your Zotero GUI and click "Export collection" to save it in a bibliography type of your choice. I use BibTex.

Next, we will configure ivy-bibtex to have easy access to one or more bibtex files within Emacs. I have opted for ivy-bibtex as opposed to helm-bibtex, because I already use ivy with swiper in general for searching capabilities in Emacs and also had some installation issues with helm that I could not resolve. You can install ivy-bibtex with M-x package-install ivy-bibtex. In my emacs config file, I have added the following, which comes from the configuration instructions on the githbub repo:

(autoload 'ivy-bibtex "ivy-bibtex" "" t)
;; ivy-bibtex requires ivy's `ivy--regex-ignore-order` regex builder, which
;; ignores the order of regexp tokens when searching for matching candidates.
(setq ivy-re-builders-alist '((ivy-bibtex . ivy--regex-ignore-order)
                              (t . ivy--regex-plus)))

(setq bibtex-completion-bibliography '("~/your-system-name/references/citations/thesis.bib"
                                       "~/your-system-name/references/citations/probML.bib"
                                       "~/your-system-name/references/citations/time_series_ml.bib")
      bibtex-completion-library-path
      '("~/your-system-name/references/pdfs/")
      bibtex-completion-notes-path
      "~/your-system-name/references/notes/")


(bind-keys* ((kbd "C-c b") . ivy-bibtex))
(bind-keys* ((kbd "C-c o") . ivy-dispatching-done))

The important pieces are bibtex-completion-bibliography which lets you define paths to one or more bibtex files that should be available to you when searching for citations. Additionally, there is bibtex-completion-library-path where you can specify a directory where the corresponding resource is saved as a pdf, so for example the latest arxiv paper you found. One note on this later, after I have introduced the key bindings and ivy-bibtex interface. Another useful path to specify is bibtex-completion-notes-path, which as you might have guessed by now specifies a directory where the corresponding notes to a particular citation are stored. Next are the key bindings which you can configure to your liking. If you now call ivy-bibtex you should see a list of all your bibtex entries pop up. You can search and navigate through them and upon finding an entry of interest, press C-c o to open up a an action menu of possible steps. They are quiet self explanatory and well documented. While there is no automatic built in way to retrieve a corresponding pdf file from the web to your directory, you can download it manually, and then use the Add pdf to library option with the key l in the action menu. This will name the pdf according to the bibtex citation, thereby linking it, and place the pdf into the bibtex-completion-library directory if its not already there.

At this point you are technically able to write org-roam documents where you can search and insert citations into your writing with ease. To ensure that the citations are formatted in a way that you like you can configure org-ref, where I have just taken the default configurations from the github repo again:

(use-package org-ref
  :ensure nil
  :init (require 'bibtex)(setq bibtex-autokey-year-length 4 bibtex-autokey-name-year-separator
                               "-" bibtex-autokey-year-title-separator "-"
                               bibtex-autokey-titleword-separator "-" bibtex-autokey-titlewords
                               2 bibtex-autokey-titlewords-stretch 1 bibtex-autokey-titleword-length
                               5)(define-key bibtex-mode-map (kbd "H-b") 'org-ref-bibtex-hydra/body)(define-key org-mode-map (kbd "C-c ]") 'org-ref-insert-link)(define-key org-mode-map (kbd "s-[") 'org-ref-insert-link-hydra/body)(require 'org-ref-ivy)(require 'org-ref-arxiv)(require 'org-ref-scopus)(require 'org-ref-wos))


(use-package org-ref-ivy
  :ensure nil
  :init (setq org-ref-insert-link-function 'org-ref-insert-link-hydra/body
              org-ref-insert-cite-function
              'org-ref-cite-insert-ivy
              org-ref-insert-label-function
              'org-ref-insert-label-link
              org-ref-insert-ref-function
              'org-ref-insert-ref-link
              org-ref-cite-onclick-function
              (lambda (_)
                (org-ref-citation-hydra/body))))

(setq org-latex-pdf-process (list "latexmk -shell-escape -bibtex -f -pdf %f"))

The last command org-latex-pdf-preprocess is important to be able to properly export the citations from an org-mode file to pdf. To now write nicely formatted pdf latex documents from org-mode with citations, add the following two lines to your document header:

#+LATEX_HEADER: \usepackage[citestyle=authoryear, bibstyle=authoryear, hyperref=true,backref=true,maxcitenames=3,url=true,backend=biber,natbib=true] {biblatex}
#+LATEX_HEADER: \addbibresource{~/path/to/your/bib/file.bib}

citestyle and bibstyle will define how the citations will be formatted and there are many different options. Now to insert a citation from the linked bib file, press C-c ] which is the org-ref default and a list of bibtex entries will pop up that lets you search entries via ivy and insert them into your document. For some additional cool and useful features, check this video by org-ref creator and Emacs wizard John Kitchin.

At this point one already has a quiet powerful and convenient setup to write nicely formatted documents in org-mode that accommodate citations. However, there were some additional functionalities that I was looking for, namely organizing notes and citations in a structured way that allow to make and discover connections between resources and your notes. This is where org-roam enters the picture. It is a form of "knowledge-management" system that have become quiet popular over the last years, probably most prominently through Tiago Forte and his platform of building a "Second-brain". It is mostly associated with being an overall productivity tool, and while it might very well do that, I was looking to specifically use it for research. There are several platforms where you can sign up and use their tools to create such a system, but why do that when Emacs can do all that :)

5. Organizing your notes

You might ask why we should go further, when it seemed like ivy-bibtex can already link notes and pdfs to citations and make them available to you in .org files. Hopefully these further steps illustrate why org-roam can be a powerful addition or just tool in its own right.

The power of org-roam lies in its ability to create connected graphs from different nodes you define. Nodes can be different things, but to maybe take an example from writing a research paper, consider the task of writing a literature review for a topic. By referencing other nodes in a node they obtain a linked connection.

Org-roam also allows you to configure a capture template to your liking, meaning that when you define a new node, you can choose what kind of node it is and have a pre-defined template that will appear for this node type.

Personally, I use the following org-roam configuration with capture templates:

(use-package org-roam
  :ensure t
  :init (setq org-roam-v2-ack t):custom
  (org-roam-directory "~/your-system-name")
  (org-roam-completion-everywhere t)
  (org-roam-capture-templates '(("m" "main"
                                 plain
                                 "%?"
                                 :if-new (file+head "main/${slug}.org" "#+title: ${title}\n"):immediate-finish
                                 t
                                 :unnarrowed t)
                                ("r" "bibliography reference"
                                 plain
                                 "* First Pass\n** Category\n(type of paper)\n** Context\n(Related Research)\n** Correctness\n(Valid assumptions)\n** Contributions\n** Clarity\n* Second Pass\n** Notes\n** Concepts I don't get\n** Questions\n** Summary\n** Relevant Related Work\n* Third Pass\n** Strong Points\n** Weak Points\n"
                                 :target (file+head "references/notes/${citekey}.org" "#+title: ${title}\n"):unnarrowed
                                 t)
                                ("t" "topic"
                                 plain
                                 "* Category\n\n%?\n\n"
                                 :if-new (file+head "topics/${slug}.org" "#+title: ${title}\n#+filetags: Topic"):immediate-finish
                                 t
                                 :unnarrowed t)
                                ("a" "other resources"
                                 plain
                                 "%?"
                                 :if-new (file+head "articles/${title}.org" "#+title: ${title}\n#+filetags:
  :article:\n"):immediate-finish
                                 t
                                 :unnarrowed t)))
  :bind (("C-c n l" . org-roam-buffer-toggle)
         ("C-c n f" . org-roam-node-find)
         ("C-c n i" . org-roam-node-insert)
         :map org-mode-map
         ("C-M-i" . completion-at-point)):config
  (org-roam-setup)
  (org-roam-bibtex-mode +1))

Under the menu key r I have defined a template for papers that I want to take notes on based on this method. As always you can change this to your liking and it is merely to show you one option. To have access to your bibtex bibliography files that we defined earlier with bibtex-completion-bibliography in org-roam, you also need to configure org-roam-bibtex. These are again just default settings and a custom key binding for inserting links from citations.

(use-package org-roam-bibtex :after org-roam)
(bind-keys* ((kbd "C-c z") . orb-insert-link))

(require 'citar-org-roam)
(citar-register-notes-source 'orb-citar-source
                             (list :name "Org-Roam Notes"
                                   :category 'org-roam-node
                                   :items #'citar-org-roam--get-candidates
                                   :hasitems #'citar-org-roam-has-notes
                                   :open #'citar-org-roam-open-note
                                   :create #'orb-citar-edit-note
                                   :annotate #'citar-org-roam--annotate))

(setq citar-notes-source 'orb-citar-source)

Defining nodes and associating nodes will allow you to examine the connections between topics and create a useful overview of information you have read. So instead of several different places where you might have taken notes, like a piece of paper, tablet, or other note taking application, you now have a centralized system to take notes, make connections and have a overview that is easily searchable.

The last mind boggling addition is org-roam-ui, which based on the local org-roam database builds a graphical representation of the connections you have made. Once again, I am just using the default configuration:

(use-package websocket
    :after org-roam)

(use-package org-roam-ui
    :after org-roam ;; or :after org
;;         normally we'd recommend hooking orui after org-roam, but since org-roam does not have
;;         a hookable mode anymore, you're advised to pick something yourself
;;         if you don't care about startup time, use
;;  :hook (after-init . org-roam-ui-mode)
    :config
    (setq org-roam-ui-sync-theme t
          org-roam-ui-follow t
          org-roam-ui-update-on-save t
          org-roam-ui-open-on-start t))

Calling org-roam-ui-open will spin up a web server showing you a graphical interface of your nodes and their connections. You can also click on nodes to see the corresponding notes you have taken! One video showing many powerful features of this web interface can be found here.

6. How I use this setup

As a lest step I would like to briefly explain how I make use of these different components as it might inspire you to also pick up such an Emacs based system or even better improve upon it. The workflow can roughly be described as such, although the order can vary given the circumstances.

  1. Find papers and add them to a Zotero collection with the web plugin
  2. Export the collection to an existing or new bibtex file and ensure visibility for ivy-bibtex by including the path in bibtex-completion-bibliography
  3. Take notes on the paper in org-roam by inserting a node with orb-insert-link and choosing a fitting template
  4. Explore connections between papers and link nodes
  5. Write summaries in .org files with org-ref using the above org-roam notes to summarize findings or write articles that you share as nicely formatted documents with supervisor, collaborator, etc.

7. Conclusion

This post hopefully gave you an idea about the great tools that exist within Emacs for systematic information organization and access. This is by no means the one system, but merely one of many approaches, even within Emacs. It is also quiet possible that I misuse or inefficiently use the described tools and that it could be improved in several ways. If you have feedback, questions or suggestions, feel free to get in touch.

Date: November 30, 2022

Author: Nils Lehmann

Created: 2024-03-13 Wed 15:58