This is a bit incongruous viz the direction of the most recent blog posts/assignments but Prof. Manovich’s visit yesterday helped me articulate a certain point. The question I raised at the end of class dealt with Exploratory Data Analysis – more specifically, with the ethics of how one might extrapolate from the data one is exploratorily analyzes. I could very well just not get it: my question feels basic enough that it could easily hinge on a misunderstanding. But if the model is one in which each given set of data are analyzed ‘organically’ – that is, according to a set of constraints arising from that set of data (rather than from some a priori set of rules, axioms, etc.) – to what end can one say that the patterns/conclusions/results of that analysis say anything beyond the constraints themselves? This is a kind of founding question of structuralism: how do the arguments made by a closed set not become circular?
An available example is experimental psychology: how, given the artificiality of any psychological experiment, can one say that one has attained results applicable outside of that environment? (In a sort of snake-eating-its-own-tail kind of way, one could even imagine a psychological experiment that comes to this very conclusion…and it being rendered in black ink as an Escher-like trompe l’oeil).
The misunderstanding I could see myself having made is that, no, Exploratory Data Analysis doesn’t claim to produce results that are applicable beyond their native data environment. Given e.g. a gigantic set of syllabi, the results of any such analysis will describe features of that specific set of syllabi; there is no sense in which it would explain what a syllabus is (as, say, an Ideal form).
Then again, if the goal of Exploratory Data Analysis is something like axiomatic set theory, then I have also misunderstood it. The interest in that case would not be on the results of the data per se but on the refinement of the axioms dictating how one is supposed to approach it…which would then make it a branch of logic.
I think there’s really three issues to which you’re calling attention.
First is that what Manovich described in response Amanda’s syllabus question was a bit of a dodge. Amanda seemed to want specific points that could be tested via the data set and Manovich gave a fairly routine description of descriptive statistics methods where by you collect the data, code it and then produce various visualizations to see what the patterns that emerge are.
Coming from my background in Education, this method is extremely well established and significant time is spent on the ethics of various coding methodologies because these will naturally influence the interpretations that you ultimately make. In so far as this is concerned, I think you’re right, no one is making claims beyond their data set. But, the size of the data set changes the significance of your findings since a larger sample size naturally increases your chance of better representative of the general population. So, conclusions about the population are always conjectural and require people to pull the methodologies apart or try it with new data sets to find out if your results are repeatable.
When people talk about descriptions of data arising “organically,” they’re hiding the fact that the coding process (and here I mean something like coding interviews, designing tests to measure certain phenomenon, creating taxonomies, or measuring something like average hue) is always putting constraints on a data set. There is nothing that comes before our methods of interpretation and those methods constrain how the data can be read. I think what people mean when they say “organically” they mean that they’re not presupposing all the tests they’re going to apply to the data, that they are open to happy accidents in terms of asking new questions, and always foreground that their methods are only one of many possible modes of interpretation. It’s a misnomer for sure, but it has its routes in more recent ethnographic/sociological work that has been done to counter some of the terrible stuff in the past (although still being done) that often imposes contemporary racist/sexist modes of thinking.
I think when you come out of a data set you come out with more questions which need more data sets and different methodologies. Only through a network of data/interpretations/versions can you start to really become confident that you might be describing something “real.” But, the very questions you’re asking sometimes help shift paradigms, and that has its role as well.