Here’s a link to OpenRefine if you want to check it out.
Working on OpenRefine reminded me of working on Excel, only bigger. And more complex. And scarier. I was initially overwhelmed with the experience because I could not get the download to work on my computer. It might have been because I really don’t know what I am doing when it comes to the computer software we are looking at. Every week I come to class and realize how behind I am on learning about how computers actually work and all the things they let you do way beyond just Word, PowerPoint, and surfing the internet. But I am trying to stay caught up with the class and compute the information we are learning.
The focus of OpenRefine and this week’s class topic was data. Working through the website taught me how specific computers actually are when it comes to data. The difference between doing something wrong and right comes down to hitting the spacebar one too many times or accidentally putting a name in quotes once and then not doing it later. The actual data of the website was so expansive and needed to be entered very specifically. I liked seeing how many different ways you can display data and how you can simplify it to make it easier to understand.
In the online article “What Data Can’t Convey,” author Marc J. Dunkelman explains that data cannot show movement or social change. It cannot show what observation can. For example, data cannot show how social doctrine has shifted over time. Miriam Posner expands on this in her article “Humanities Data: A Necessary Contradiction,” asserting that there needs to be a way to convey power-based issues regarding race, class, and gender through data. I believe that the disconnect between data and humanities causes the facts and numbers associated with the data to void the deeper issues they describe. Posner states that Digital Humanities show movement over time, yet if the data fails at conveying the point it is trying to make, is it really worth the 17,000 entries in OpenRefine? Data and the explanation of it has to go hand in hand for Digital Humanities to work.
One of the main issues we addressed in class regarding OpenRefine was how specific one has to be to correctly classify and group the same publisher (which is just an example of a classifying column we had.) There were lots of different combinations of the same publisher. Some were misspelled, some had one extra space after the word, some were enclosed in quotations, some spelled out the entire name of the company while others didn’t. Nothing was uniform. On the other hand, sometimes historical data has things missing, is incorrect, or had many variables that will affect entering the data. For example, we looked at a census from ancestory.com. The census is very difficult to understand and if you were recording the data from it, you would have to make many assumptions that probably are incorrect. The incorrect data will cause problems when it is entered into a tool like OpenRefine. To balance the specificity needed by computers and the instability of historical data, digital humanists need to be careful when entering data and interpreting it. Maybe computers can make little assumptions that are controlled by the human operator, like a window popping up asking them if they was it to make an assumption. By bringing computer data and history closer together, Digital Humanities will become less confusing and more helpful for its users.