How someone organises, or chooses not to organise a bookshelf can say a lot about them. I personally don’t think there is any sense in enforcing a specific order. My strategy can be summarised as ‘vibes based’. There is joy in being able to know roughly where a book is and pick it out mid conversation. It makes me feel like I’m some sort of magician or an old librarian, holding the secret mental key that will unscramble order from chaos.
But, if you don’t believe in this serendipity, you might be inclined to choose something to base the order on. Given we are pretty good at remembering the title of a book, maybe alphabetical order is good. Actually, authors should be grouped together, so let’s make them alphabetical. What about we do that and then also group some categories of similar books together (S/O Dewey). The most intelligent of us out there may choose to forego any sense of useful meaning, and satirically choose to base the order on the colour of the cover, subverting the old book judging adage.
These approaches have been discussed at length here here here here and here, you get the idea. The reason they’re up for debate is because none of them are truly optimal. None take advantage of all the available information... Until now. Later on in this post I will describe an optimal strategy to sorting a book collection that I’ve also made open source, but first we must collect some data.
Digitising all the books I own is a task low down on my todo list (compared to finishing my PhD). So to aid in this, I created a Command Line Interface (CLI) tool to help me. It’s available here and I’ve creatively named it optimal-bookshelf
. It allows you to add books to a persistent locally-stored virtual library by searching for a title using open book APIs (i.e. Google Books), after which a more thorough search is performed and the specific edition can be selected. Ideally most of the information is correct at this point, but you then have the option to edit specific data such as the ISBN. I tailored this tool as I was digitising my collection, so it should be as efficient as possible on the Pareto curve of information vs time.
This enabled me to create an accurate digital representation of all the books I own, down to the specific edition, in a relatively short amount of time.
But what to do with this wealth of information? Well, for now it is largely text-based. To do anything that will concern a notion optimality, we need numbers. It follows that we can summarise all the information that we have generated, and result in a numerical representation by creating embeddings of our virtualised book-twins. By tokenising the information, embeddings represent text as dense vectors in a high-dimensional space using models trained specifically for this purpose. These embedding models are designed to capture semantic relationships between words and phrases, allowing us to perform mathematical operations that measure how similar different pieces of text are to each other. In the optimal-bookshelf
CLI, embeddings are created using OpenAI’s text-embedding-3-large
model and can easily be generated using the $ bookshelf embed
command.
If I make the assumption that I want books near each other to be semantically related (i.e. have a low Euclidian distance in embedding space), and that every book must appear on my bookshelf exactly once, I can equivalently formulate this task as an optimisation problem, minimising the total distance travelled by traversing from embedding to embedding. In this case I don’t mind that this trip isn’t a loop (because my bookshelf is a ‘line’ and not a circle, but this can be revisited later), resulting in a formulation of the classic integer programming no-return Travelling Salesmen Problem. Solving this will result in a sequence of books that embodies the lowest total semantic difference between neighbouring books. It makes the most sense, in ways that are not just affiliated with the title or the author, or even the genre, but the fundamental concept of that book. Having generated embeddings, solving this problem is easy in optimal-bookshelf
. Just run the command bookshelf tsp.
For visualisation purposes, you can also solve a TSP over embeddings that have first been reduced to two dimensions. The easiest and generally most effective way to do this in situations where the meaning isn’t too important (alert they can be misleading) is to use t-SNE. Running bookshelf tsp -v
(for visual) first uses t-SNE to reduce the dimension of the embeddings, and then solves the reduced dimension TSP (as the TSP relies on a distance matrix it scales with number of points and not dimensionality) resulting in the following:
It’s really interesting to follow the path and see themes emerging based on author or publication date, or even if the book is read, unread, or in progress, though I’m choosing to keep secret and just label the titles. It all acts like a constellation of my reading interests and habits, similar to the manually created constellation of artists at the Tate Liverpool.
I think an interesting extension would be to specify a certain number of shelves, then solve the TSP for a corresponding number of lines equal to the available shelves. In theory I have the number of pages in each book available, so I could add a physical packing constraint here to ensure all the books fit. For full transparency, here’s my current optimal shelf based on this semantic TSP. Spot the various threads that emerge…
1. GRAVITY'S RAINBOW
2. THE MASTERS
3. THE BOOKS OF JACOB
4. FLIGHTS
5. DRIVE YOUR PLOW OVER THE BONES OF THE DEAD.
6. THE PLAGUE
7. PANDAEMONIUM 1660–1886
8. UTOPIA
9. POLITICS ON THE EDGE
10. THE RESTLESS REPUBLIC: BRITAIN WITHOUT A CROWN
11. 1599: A YEAR IN THE LIFE OF WILLIAM SHAKESPEARE
12. WILLIAM SHAKESPEARE POETRY
13. OEDIPUS AT KOLONOS
14. THE ODYSSEY: TRANSLATED BY EMILY WILSON
15. THE ILIAD: TRANSLATED BY EMILY WILSON
16. THE DIVINE COMEDY
17. DON QUIXOTE
18. THE THREE MUSKETEERS
19. THE COUNT OF MONTE CRISTO
20. CRIME AND PUNISHMENT
21. THE KARAMAZOV BROTHERS
22. THE IDIOT
23. HUMAN, ALL TOO HUMAN & BEYOND GOOD AND EVIL
24. THE EMANCIPATION PROCLAMATION
25. THINK AND GROW RICH
26. TALKING TO STRANGERS
27. ESCAPE FROM MODEL LAND
28. VALUES
29. DOMINION
30. JERUSALEM
31. THE WORLD OF STONEHENGE
32. GOING TO CHURCH IN MEDIEVAL ENGLAND
33. THE RUIN OF ALL WITCHES
34. MIDNIGHT IN CHERNOBYL
35. ABYSS: THE CUBAN MISSILE CRISIS 1962
36. SECRET WARS
37. DIVIDED HOUSES: THE HUNDRED YEARS WAR III
38. TRIAL BY BATTLE: THE HUNDRED YEARS WAR I
39. TRIAL BY FIRE: THE HUNDRED YEARS WAR II
40. HIGH PERFORMANCE ROWING
41. WRITING ABOUT SPORT
42. GAZZA AGONISTES
43. THE LAST LEONARDO
44. THE CREATIVE ACT
45. ON ART AND LIFE
46. STORY OF ART
47. THE PENGUIN BOOK OF CLASSICAL MYTHS
48. VENI, VIDI, VICI
49. THE TWELVE CAESARS
50. THE HISTORY OF THE DECLINE AND FALL OF THE ROMAN EMPIRE: ABRIDGED EDITION
51. PAX
52. ASTERIX: ASTERIX AND THE WHITE IRIS
53. SUPER-INFINITE
54. SYSTEMS FOR...
55. MOBILE MANIA
56. THE ICON CATALOGUE UK GARAGE VOL. 1
57. LONDON FEEDS ITSELF
58. LONDON FIELDS
59. DAMASCUS STATION
60. MOSCOW X
61. A PERFECT SPY
62. THE SPY AND THE TRAITOR
63. BEHIND THE ENIGMA
64. THE MERCENARY RIVER
65. THE LADYBIRD BOOK OF THE HANGOVER
66. STIG OF THE DUMP
67. THE TALES OF BEEDLE THE BARD
68. HARRY POTTER AND THE HALF-BLOOD PRINCE
69. JOURNEY TO THE CENTRE OF THE EARTH
70. DR JEKYLL AND MR HYDE
71. TREASURE ISLAND
72. TOM SAWYER & HUCKLEBERRY FINN
73. THE ADVENTURES AND MEMOIRS OF SHERLOCK HOLMES
74. THE RETURN OF SHERLOCK HOLMES
75. MACHINES OF LOVING GRACE
76. PROCESS DYNAMICS AND CONTROL
77. GAUSSIAN PROCESSES FOR MACHINE LEARNING
78. BAYESIAN OPTIMIZATION
79. NUMERICAL OPTIMIZATION
80. ROBUST OPTIMIZATION
81. FOUNDATIONS OF APPLIED MATHEMATICS, VOLUME 2
82. FOUNDATIONS OF APPLIED MATHEMATICS, VOLUME I
83. HOW TO PROVE IT
84. INTRODUCING LOGIC
85. INTRODUCING QUANTUM THEORY
86. INTRODUCING CHAOS
87. INTRODUCING FRACTALS
88. INTRODUCING INFINITY
89. OUR MATHEMATICAL UNIVERSE
90. COLLINS DICTIONARY OF MATHEMATICS
91. FOUR COLORS SUFFICE
92. RINGWORLD
93. THE LORD OF THE RINGS
94. THE HOBBIT
95. CLOUD ATLAS
96. A SUPPOSEDLY FUN THING I'LL NEVER DO AGAIN
97. INFINITE JEST
98. AMERICAN PSYCHO
99. BRING UP THE BODIES
100. THE NATION KILLERS
101. BURY MY HEART AT WOUNDED KNEE
102. STALINGRAD
103. WAR AND PEACE
104. NAPOLEON IN EGYPT
105. GREECE
106. ATHENS
107. DEMOCRACY'S BEGINNING
108. THE BATTLE FOR THE ARAB SPRING
109. MIKE CONTRE-ATTAQUE!
110. KISSINGER
111. THE ESCAPE ARTIST
112. 1000 YEARS OF JOYS AND SORROWS
113. LUCKY KUNST
114. MURDER ON THE DARTS BOARD
115. GONE FISHING
116. THE SATSUMA COMPLEX
117. THAT’S YOUR LOT
118. MARCH OF THE LEMMINGS
119. HOUSE ARREST
120. COMING HOME
121. NO TURNING BACK
122. KILLING THATCHER: THE IRA, THE MANHUNT AND THE LONG WAR ON THE CROWN
123. SAY NOTHING: A TRUE STORY OF MURDER AND MEMORY IN NORTHERN IRELAND
124. FALL
125. ONE TWO THREE FOUR: THE BEATLES IN TIME
126. K-PUNK
127. DEATH AND THE PENGUIN
128. MURDLE
129. THE HARD-BOILED WONDERLAND AND THE END OF THE WORLD
130. ON THE ROAD
131. A MOVEABLE FEAST
132. FLAUBERT'S PARROT
133. NINETEEN EIGHTY-FOUR
134. TENDER IS THE NIGHT
135. THE UNBEARABLE LIGHTNESS OF BEING
136. SLAUGHTERHOUSE-FIVE
137. CATCH-22
138. WE ALWAYS TREAT WOMEN TOO WELL
I think another next logical step would be to build in a recommendation tool based on the vector embeddings and maybe even the resulting TSP solutions. This would potentially result in a bi-level integer program (hard) depending on what you deem most important (add a book that would fit most optimally within a current collection of books, or add a book that would result in the most added distance to the resulting optimal semantic tour?), but with a small enough number candidates this could be solved without approximation. We will see.
Was it worth it? Perhaps the lesson in all of this is that fulfilment lies not in choosing between chaos and order, but in finding systems that embrace both—straddling each paradigm, allowing us to be both the organised librarian and to take a book for a walk.
As I’ve quoted before, Donella Meadows states in her book Thinking in Systems: A Primer:
There is yet one leverage point that is even higher than changing a paradigm. That is to keep oneself unattached in the arena of paradigms, to stay flexible, to realize that no paradigm is "true", that every one, including the one that sweetly shapes your own worldview, is a tremendously limited understanding of an immense and amazing universe that is far beyond human comprehension.
All code for this post can be found here: https://github.com/trsav/bookshelf