I’m planning to write up some thoughts and my results from DAHSS17 in Malaga but for now here’s a storify I made.
I’m planning to write up some thoughts and my results from DAHSS17 in Malaga but for now here’s a storify I made.
We are working at the Malaga Digital Art History Summer School to think about ways that we can visualize or analyze “Big Data” for Art History. I’ve been doing little meaningless tests on the 6000+ objects from the Metropolitan Museum of Art Department of Ancient Near Eastern Art. I thought this could be a fun place to start. We were doing a workflow of importing the data from the Met Github repository, refining/reducing in R Studio, then exporting to Gephi to do network visualizations. I made a bunch of meaningless visualizations that I’m embedding here to discuss with my group on Monday.
This is a network visualization of the object names by material – since some objects share names “cylinder seal” etc they appear as larger dots, but that’s not actually meaningful. At least it looks a little interesting.
I then tried to look at “classification” and “culture” to see if I would see different materials/types of object by culture. With two different visualizations
I also made a visualization of objects by culture – but then forgot somehow that this would lead to totally separate groups because in the database you can only be attached to one culture at a time.
so that’s really ugly, but when you zoom in in Gephi you can see each object number – if somehow that was useful. I don’t know. Anyway, it’s been fun looking at these visualizations, but in the end I think I need to use multimodal networks and think more carefully about questions that could be answered. I actually think that maybe network visualization is not the right method for this dataset, but since we are a group and all working together at the summer school it’s fun to see what I can make together with my colleagues.
Standby for more tests and weird experiments!
Since September I’ve been working at the Metropolitan Museum of Art and you will find my research incorporated into their website, rather than here. I do, however, have an upcoming public talk that is part of the annual Fellows Colloquia, specifically the session,
“An Admirable Scheme”: The Symbiotic Relationship of Archaeology and Art at The Metropolitan Museum of Art in the Early Twentieth Century
Caitlin Chaves Yates, Andrew W. Mellon Curatorial Research Fellow, Department of Ancient Near Eastern Art
Full information is available at: http://www.metmuseum.org/events/programs/met-speaks/symposia/ancient-near-east-site-to-museum
Before embarking on a full-scale network analysis of Northern Mesopotamian EBA cities, it seemed prudent to do a small test case to see 1. if it would work at all, 2. what kind of results we might expect, and 3. what kind of information should we be collecting.
Today I want to talk about number 3. What kind of information should we be collecting?
We decided to start with lithics, since Tobias’ recent dissertation focused on lithic use. We thought it might be an easy entry point to try a preliminary analysis since we could easily create the table of data necessary from his background knowledge. Since it was only a feasibility test we were not too concerned with the specifics of the data, knowing we could refine and update the data later if it turned out to be a useful avenue for our test. We supplemented our chart using data from the ARCANE database.
We used three main categories in our chart: Obsidian use, finished Canaanean blades, Finished Non-Canaanean blades and Canaanean blade production by-products. Using a presence absence scale. This is technically a bi-modal network site->lithic<-site, but we converted it to a uni-modal network, so the sites are connected site<->site with the edge being shared lithic use. The maximum weight that any edge can have is 4 – indicating that the linked 2 sites both have evidence of all 4 types of lithic remains
The results were the following graph
While the graph is interesting and appears to show many connections, if we sought to explain anything about EBA connections between cities, there is little to say using this graph. It shows that some sites have more than one type of lithic used, and that those are more central to the network. But if we think about this in terms of what the graph means, we are saying that sites with a more diverse lithic assemblage are somehow more central to EBA system of cities, but in fact, that may not be true. Even by reducing the interpretations to only lithics, it also does not tell much since the data was selected poorly – obviously sites with Canaanean blade production with have evidence of finished Canaanean blades as well, thereby setting up these sites to be most central to the network (e.g. Titris and Chuera), but in fact we do not know that Canaanean blade production does in fact make a site more central to the system, instead it may be that Titris and Chuera are peripheral to the network and finished blades are distributed through sites that are central in other ways. In short, by not thinking carefully about the categories we chose for this analysis, we set the analysis up to fail. After realizing that the graph was not particularly illustrative, I still decided to play around with it a little to see how I could manipulate the data and use it as an experiment for thinking about what kinds of categories I will need going forward and how I can rectify some of the problems I created in this test case.
Also, this graph suffers from the dreaded “spaghetti monster” problem where every node is connected to almost every other node. This of course is unsurprising when you think about the data we input to make this graph – all sites have some use of lithics, and we selected common types to graph, therefore, there are many connections. Once you start to establish cut-off weights for the edges or connections, however, you can start to see the peripheral sites dropping out of the network.
In this graphic representation, the edge weight is reduced to only sites that share 2 or more connections (i.e. 2, 3, 4 values for edge weight). You can see that the sites on the bottom left and top right have become disconnected from the graph.
Raising the cutoff for the edge weight to 3 or 4 shows most sites not longer as part of the network, with only the core of the graph remaining.
So what does this series of graphs really show us?
Well first, it does show that there are meaningful differences in the distribution of lithic types (i.e. we don’t find all types everywhere), but as you can probably imagine, this is not a revolutionary conclusion and you do not need network analysis to tell that lithics are distributed differently across the region (any archaeologist working in the region knows that already).
Second, it shows that lithics alone can not be used to answer the kind of question we set out to answer about the connections between sites. That is not say that a network analysis of the distribution of types of lithics, particularly if sourcing could be included would not be valuable, only to say that our network is insufficient for answering these types of questions. This test case showed us that using one type of material culture was not going to be enough to reveal meaningful patterns, or at least not on the level of the data we had available to us. There is some amazing work on obsidian sourcing and obsidian use (see Golitko, Mark, and Gary M. Feinman, 2015 Procurement and Distribution of Pre-Hispanic Mesoamerican Obsidian 900 BC-AD 1520: A Social Network Analysis. Journal of Archaeological Method and Theory 22: 206–247.) but our data was not nearly as refined and not designed to look at sourcing.
Third, this case clearly demonstrated that determining the cutoff weights for connection will be very important in deciding which sites are connected. As you can see from the first graph, at a low edge weight it appears that the network is tightly connected, however, using the higher cut-off weights reduces the network to almost nothing, suggesting that there in fact, is not really a strong network here at all, but rather a set of very loose connections.
This small test confirms the common saying among network analysts that just because you CAN make something into a network, doesn’t mean you should.
Overall, this short test case demonstrated that we will need a broader approach if we were going to produce any kind of pilot project. I will discuss our selection of categories in the next post.
tl:dr Using only one type of find did not reveal the kinds of connections necessary for our investigation of EBA cities.
Last February, before setting out for my research fellowship in Berlin I submitted an abstract for the American School of Oriental Research (ASOR) Conference in Atlanta. That abstract is now coming home to roost. The paper was accepted for the Cyber-Archaeology in the Middle East session and I’m going to use this space over the next 3 months to talk about how I am developing both the project and my paper/presentation. After the conference I hope to be able to post all the slides here.
So, here is the abstract I wrote:
This paper presents the preliminary results of a network analysis of Early Bronze Age (EBA) city-states in Northern Mesopotamia. Using published excavation reports the EBA Network project attempts to investigate the question of interconnectedness between EBA city-states, a dense web of relationships that, until now, has only been examined on a cursory level with an assumed connection based roughly on geographic boundaries. Previous analysis of inter-site connections has generally focused on specific cultural markers such as written records, ceramics or urban layout (i.e circular cities) and consists of dividing Northern Mesopotamia into vague ‘zones’ or ‘spheres’ which are generally indicated by the presence and/or absence of a particular trait (e.g. Ninevite 5, Metallic Ware, Reserved Slip areas). The EBA Network Project examines these categories alongside all of the other potential connections to avoid giving precedence to any one data source or type. Network analysis, using cities as nodes, can be used to show patterns of connection that are not strictly limited to geographic areas. Furthermore, the network patterns vary depending on the types of finds investigated, revealing a complex set of interactions between sites. This paper demonstrates the potential for application of network methods on understanding the relationships between EBA sites and discusses the difficulties and challenges of setting up this type of digital-based project. Further avenues for exploration and potential for the data are also discussed including possibilities for expanding the database and its categories and data reuse for different research questions.
I do think I will be able to show some possibilities for the use of networks despite the currently fledgling nature of the project. I think where I can really succeed is in the “demonstrating potential for application of network methods” and I will certainly have lots of examples for the “difficulties and challenges” of this kind of project. I hope to use the paper to inspire conversation, and perhaps entice other NE archaeologists to collaborate and work with us on aspects of the project and think about using network methods. I am planning to start my look into our project with Tom Brughmans conversation on “Best Practice Guidelines for Network Science in Archaeology.” My first step for starting to outline my talk is to really think about how to get the most not just out of the data we collect, but out of the methodology of networks as well.
Stay tuned to this space for more musings on putting together my ASOR15 paper.
So while there are some great data sets out there for learning about network visualizations (see previous post for some tutorials with data that I tried), I thought it might be useful to create my own to help me learn about how the data should be structured. I started with a published article on the ceramics of Northern Mesopotamia during the third millennium
It is Rova’s chapter title ‘Pottery’ in Associated Regional Chronologies for the Ancient Near East and the Eastern Mediterranean, Vol. I: Jezirah
I used the chart of the different types to create a spreadsheet of sites and type numbers of ceramics for the EBA III types.
Target = site
Source = ceramic type number
I used Shawn Graham’s instructions for using Gephi to convert a multimodal network to a unimodal network.
When I was finished I ended up with this:
I think this visualizations is a perfect example of this picture does not show you what you think it does. By that I mean, Beydar comes out highly centralized and has the most types in common with the other sites (i.e. highest node degree), but this is due to how the data was chosen. Basically, what this visualization shows is that the author used the most examples from Beydar, and not-coincidentally I think, it’s because she worked at Beydar. Obviously I only created this as a test and to see how the transformation would work, not because I thought that a list from a table of one article would provide me any meaningful correlations between sites. In that sense I found this exercise to be helpful, as it gave me a chance to think about what data I will really need to do this kind of analysis, how it might be standardized (sharp-eyed readers might see that Raqa’i accidentally appears twice due to misspelling – should probably unique identifier numbers for sites to avoid this kind of problem), and what I might want to measure. Overall though I found the process of creating my own csv file, then converting it in Gephi to be a useful way to spend a morning as far as my own education goes.
So your TL:DR for today is – be careful how you construct your data, because maybe your visualization will just show you the site where the author worked.
Since EBA networks project is, obviously, about networks I’ve been slowly working my way through some great internet resources in an attempt to learn more about how networks work. So far I’ve found Scott Weingart’s Networks Demystified series to be super helpful for an introduction. I’ve also made good use of Shawn Graham’s excellent tutorials (like this one) to start learning about software and their capabilities.
I think the next step is to force some data into a format I think I can use and then just try (presumably fail) and try again until I start to have a handle on the potential for these methodologies and approaches within my own work.