So I have spent most of my time at school working on data sets where there was a small sample size (<25) or where there was an adequate sample size for most things (>200 but <500). The first is a rather specific skill set to do quantitative analysis. The second is just the general purpose skill set of most people in psychology. One of my friends, Tyler, is a sociologist that specializes in social media. We talk about quantitative methods often. I cannot remember if it was to prove a point, out of mutual curiosity, or if he merely wanted to see what I would do with it, but he generated to samples for me. One sample was a random sample of twitter users who posted over a two week period. The other sample was of individuals who posted the word bored during the same period of time. Before cleaning up the dataset, there were about 1.6 million unique tweets. A random sample of the bored sample was used to determine if the people who posted the word bored were complaining of boredom on twitter. This was the case.

At this point, I wanted to know if I could differentiate between the two groups based on their user statistics, which served as the independent variables. This was comprised of the number of followers, the number following, and the number of tweets. I could have used logistic regression or I could I have used discriminant function analysis. I opted for discriminant function analysis because I use it less and it specifically is designed for continuous independent variables. I was also less familiar with this so it was good practice. I was able to discriminate between individuals who posted tweets that contained the word bored and the random sample. The function that differentiated these two groups was statistically significant, with p<.0001. The power of this function was also perfect, with power=1. Power and significance deal with the likelihood of accepting or rejecting the null hypothesis, that there are no differences. Statistical power is defined as the likelihood of rejecting the null hypothesis when there are no differences. Statistical significance is defined as accepting the null hypothesis when there are no differences.

I did some further analyses, such as jackknife classification to determine the percentage of correctly classified users by the function. I found that roughly half of the participants were classified correctly. This helped to explain that the function that discriminated the two groups only explained 4% of the variance, meaning that despite the massive sample size there was a very low effect size. Effect size deals with the strength of the relationship between the variables in the finding and is a rough measure of the likelihood of being about to replicate the finding in a different sample. The astonishingly low effect size led me to give up on the project after I played with it a little more. Given some of the directions in the relationship between the independent variables in the function that discriminated the two groups, I got to play with structural equation modeling to test mediated moderation. Good times, but the same issue with significance, power, and effect size remained.

This brings me to my point and why I do not remember why the data was initially handed to me. I have been highly critical about how individuals analyze social media data. Psychology, rightly, started to expect power and effect sizes to be reported because these things bring so much to light. Significant results may have high power and low effect sizes. This is possible because of the large sample size. This means that the findings may be right, but must be limited in how much father is put in them. Social media, which draws on cutting edge technology is typically analyzed through means that are years behind. The only way to analyze large sample sizes are through these added points of reference. The only thing that was important in this sample was effect size. Given the relationship between significance and power, this makes sense. Analyses of social media need to take effect size into account because this is where the strength of your findings can be determined. As an added aside, when analyzing social media, you must also factor into any model how long an individual user has been a member of the site. An individual who has been a member for years, but posts once a week is very different in behavior from someone who posts 30 times a day but has only been a member for a few months. This, I am sure, will help to explain a lot of the random variance observed in any statistically significant effect.

I actually think effect sizes are (usually) pretty unimportant in cognitive science (although much much moreso in clinical psych). Generally, we (meaning my cog peeps) are trying to do theory-driven work and falsify one or more competing hypotheses in an experiment. Effect size and theoretical importance are essentially unrelated; something can be hugely theoretically important, and have a tiny associated effect size. Hypothesis testing is a game putting a yes or no question about how the world works in the form of a statement, making the question of ‘and how much?’ tertiary, theoretically unrelated. It might have extreme clinical/practical importance of course, but I’m not at all convinced that our cultural shift of expecting and requiring CI95s and effect sizes be reported is a good one.

Sorry for the delay. You know that this topic is something I constantly think about. The research in this article was prototypical of psychology at this point. We have built ourselves into a corner where we need to worry about this. At the same time, there is a distinction between the cognitive measures, which utilize a task administered repeatedly to determine a single subject’s response, and simple response. This distinction is the same as between a single item that assesses how you feel and a full state measure that has multiple items that assess one thing.

I do agree with you that given what I have just said, effect size is unrelated to the importance of the finding and that specific controls and experimental design can explain more than just effect sizes. With the papers I submit, the common critique is that there are no effect sizes. To which, I reply that there are no well established methods of determining the effect size, however, the type of modeling used is generally viewed as being stronger for this type of analysis than the commonly, and wrongly, used analyses.

What we really lack, which is a more recent focus of thought, is a way to translate the significance of cognitive measures to influence the effect sizes. It just does not exist to the best of my knowledge. Experimental design helps by limiting the possible explanations. Being able to better model effect sizes of cognitive measures, things that have been reduced to a single or a hand full of variables from a lot of questions, is something that needs to be discovered. As it stands now, effect sizes in cognitive measures are pretty useless.