This is another post that’s probably going to upset a few people. It’s about the predictiveness of puppy temperament testing.
Temperament testing has been an important fixture in determining suitability of dogs for service work, therapy work, police and military service, and even for companion dogs.
I started temperament testing in the 1990s in the belief that it was the only responsible way to place a puppy. All of the other good breeders did it, my mentors did it (including those involved with major service dog organizations and the US Department of Defense puppy raising program), so I did it.
But like many things involved in breeding, it’s not necessarily a sound practice simply because everyone is doing it. We also need to look at the science.
Back then, we didn’t have much, if any, science, we just had the best efforts of the best in the business. Now we have some studies to look at, so let’s take a look at what they say.
Do puppy temperament tests actually predict adult behavior?
Service Dog organizations have a great deal invested in the performance of adult dogs, have regimented puppy rearing programs, and are able to collect data and do follow up, so much of the results and studies we have are from service dog organizations.
There are more than a few studies available. Here are some highlights, and you can pour through the references if you really want to get lost in the weeds.
In 1997, a study was performed on 630 eight-week-old German Shepherd puppies born into a service dog program with a follow-up evaluation at 14-19 months. The ability of the testers to predict adult behavior from puppy temperament tests was “negligible and the puppy test was therefore not found useful in predicting adult suitability for service dog work.” In fact, the correlation of behavior from puppyhood to adulthood was “exactly what would be expected by pure chance.” The authors conclude “… adult behaviour cannot be predicted as early as at eight weeks of age. Breeding programs aimed to improve behaviour in dogs may not be based on information collected on tests performed as early as at eight weeks of age.” This study also found that maternal effects are present in puppies, but that effect wanes once the puppies reach full adulthood.
In 2013, a study of 465 puppies in a guide dog program found low predictability between puppy temperament and certification as guide dogs as adults. The most predictive characteristic in the test was not success, but failure.
A study in 2014 evaluated 134 Border Collie puppies at days 2-10, days 40-50, and then again at 1.5-2 years. There was little correlation between puppy evaluation results and behavior at 1.5-2 years. Only exploratory behavior was found to be correlated into adulthood. The study concluded “the predictive validity of early tests for predicting specific behavioural traits in adult pet dogs is limited.” The really interesting thing about this study is that fear in puppies was NOT correlated with fear in adulthood. In fact, the inverse was shown and some of the most fearful puppies ended up being the most friendly adults.
Fearfulness is somewhat predictive at 3 months of age, but prediction accuracy improved with age. The same researchers conducted another study two years later and concluded that none of the tests they performed were predictive of ability to learn specific tasks.
Another guide dog program study concluded that “when applied at 7 weeks of age without an additional criterion, the test has no predictive value regarding future social tendencies.”
In a study of specific AKC breeds, tests were interestingly predictive of breed, the were not, however, predictive of adult temperament. “the puppy temperament scores were unreliable in predicting adult temperament.”
A few characteristics, such as playfulness, have some correlation.
A couple of studies had results that conflicted with those I list above.
This is normal in science, and the responsible way to handle these conflicts is to look at the studies individually for quality and also to look at the evidence as a whole.
In other words, you need to ask yourself if there more evidence supporting a particular conclusion. The truth is found in a preponderance of information, not in a specific single study.
Something else to consider is that a closer look at these few outlying studies show they tend to have smaller sample sizes and the studies with larger sample sizes (which are more reliable as a whole).
There was some correlation in this study between puppy testing for aggression and submissiveness, but lower correlation for responsiveness to training, fearfulness, and sociability. “Overall, we found evidence to suggest substantial consistency (r = 0.43). Furthermore, personality consistency was higher in older dogs, when behavioral assessment intervals were shorter, and when the measurement tool was exactly the same in both assessments. In puppies, aggression and submissiveness were the most consistent dimensions, while responsiveness to training, fearfulness, and sociability were the least consistent dimensions.”
In a study from a South African police dog program, retrieval was highly correlated from puppyhood to adulthood, with other traits not correlating from puppy hood to adulthood, but with correlation from juveniles to adulthood.
A study of 206 German Shepherd dogs in a police dog program showed correlation between certain behaviors at 7 weeks (catch, chase, fetch, and follow a dragged rag) and certification as adults. These criteria, however, are more evident of specific drives those dogs possess and not necessarily of personality or temperament traits.
If testing of temperament in puppyhood is not predictive, when is temperament evaluation reliable?
A 2012 study of guide dog candidates using C-BARQ  criteria (a standardize behavioral assessment) of a whopping 8,000 dogs determined that while the test was not predictive of success, it did allow them to rule out dogs likely to fail when those dogs were tested at 6 and 12 months.
So while evaluation at those ages were still not predictive of success, they were predictive of failure. This is consistent with what we have seen in other studies discussed above.
So if temperament testing isn’t predictive, what is?
In a recent study, there were things we as breeders can do and can educate/encourage in our puppy homes to stack the decks in favor of better outcomes for our puppies. (This study used C-BARQ evaluation)
Experience or ability of families raising the puppy (aggression toward humans and dogs, fear, and touch sensitivity)
Another dog in the household (lowered aggression toward household members)
Avoidance of traumatic events (fear and aggression)
So we need to work with our puppies and help our families better socialize, handle, and train when they get their puppies home.
My Personal Choice
I’ve been reading the research on this for a couple of years. I’ve talked about it a little in some online discussions, but I’ve been hesitant to make any boldly public statements. But here it is now.
Two years ago I took the plunge and followed the science.
I stopped temperament testing.
And the sky hasn’t fallen.
People are happy with their companion dogs and just as successful with their service dog candidates or therapy dog candidates.
Granted, this is anecdotal. My program is a small sample size, I don't have a robust (or even close) data collection method, and I make no claim of my experience being a scientific representation.
Now if you temperament test and are happy with it, that’s great. I know from experience that some customers like it, it makes them feel better.
But if we are honest with ourselves about what the science is saying, for the most part temperament testing doesn’t really mean a whole lot, if anything.
And I decided that instead of taking all of the time and energy to evaluate puppies, write it all up, and communicate all of that to families, I would rather take that time and energy and put it back into working with the puppies.
In the end, without a high degree of predictability, I determined temperament testing to be a waste of time. I would have the puppies TTed, video, edit and upload videos, write up the evaluations, and then discuss with families.
That's a good 2-3 days of work I could have used for working on things I knew the puppies needed because I was already spending all day with them.
I didn't need 3 days of work, plus paying an evaluator fee, to know if any of my puppies had noise sensitivity or liked to retrieve or was high energy, etc. So I was already clear on what my puppies needed help with and I found it more productive and better for the puppies for me to use the time formerly spent temperament testing actually helping the puppies instead.
It’s paying off for me and I hope you find what works best for you.
If you find it helpful for your program, that's great. I just wanted to review what studies are showing so that you can decide for yourself with open eyes.
Relying on tools like temperament testing is important. However, we need to be realistic about what the tools can and can't do, and we want to make sure that what we are doing is effective.
Update 6 June 2020: Can Temperament Testing Actually Harm Puppies
This update is inspired by an insightful comment from Susanne Shelton (BOM CPDT KA) in an online discussion of this post.
Susanne points out the potential problems of labeling puppies as well as unskilled evaluators actually traumatizing puppies during the course of the temperament test. This is known as the "expectancy effect" and fairly well studied in a variety of species and covered in this post. While labels can sometimes help, they have more of a tendency to lower expectations and inadvertently influence interactions with that individual. With little exception, anyone can be a temperament evaluator. There are no standards or oversight. She also points out the benefit of GOOD evaluation potentially being helpful for breeders who struggle with understanding puppy behavior and can help identify behaviors that need work prior to placement.
References and Footnotes
 Wilsson E, PE Sundgren. “Behaviour test for eight-week old puppies—heritabilities of tested behaviour traits and its correspondence to later behavior.” Applied Animal Behaviour Science 58 1998 151–162 https://www.sciencedirect.com/science/article/abs/pii/S0168159197000932
 Asher L, Blythe S, Roberts R, Toothill L, Craigon PJ, et al. (2013) A standardized behavior test for potential guide dog puppies: Methods and association with subsequent success in guide dog training. J Vet Behav Clin Appl Res 8: 431–438. https://www.sciencedirect.com/science/article/pii/S1558787813001925
 Riemer S, Müller C, Virányi Z, Huber L, Range F. The predictive value of early behavioural assessments in pet dogs--a longitudinal study from neonates to adults. PLoS One. 2014;9(7):e101237. Published 2014 Jul 8. doi:10.1371/journal.pone.0101237 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086890/
 Goddard ME, Beilharz RG (1984) A factor analysis of fearfulness in potential guide dogs. Appl Anim Behav Sci 12: 253–265. https://www.sciencedirect.com/science/article/abs/pii/0168159184901187
 Goddard ME, Beilharz RG (1986) Early prediction of adult behaviour in potential guide dogs. Appl Anim Behav Sci 15: 247–260. https://www.sciencedirect.com/science/article/abs/pii/016815918690095X
 Beaudet R, Chalifoux A, Dallaire A (1994) Predictive value of activity level and behavioral evaluation on future dominance in puppies. Appl Anim Behav Sci 40: 273–284. https://www.sciencedirect.com/science/article/abs/pii/016815919490068X
 Robinson, LM, RS Thompson, JC Ha. Puppy Temperament Assessments Predict Breed and American Kennel Club Group but Not Adult Temperament. Journal of Applied Animal Welfare Science. 19:2, 2016.
 Scott JP, Beilfelt SW (1976) Analysis of the puppy testing program. In: Pfaffenberger, C.J., Scott, J.P., Fuller, J.L., Ginsburg, B.E., Bielfelt SW, editor. Guide Dogs for the Blind: Their Selection, Development and Training. pp. 39–75
 Fratkin JL, Sinn DL, Patall EA, Gosling SD. Personality consistency in dogs: a meta-analysis. PLoS One. 2013; 8(1):e54907. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0054907
 Slabbert JM, Odendaal JSJ (1999) Early prediction of adult police dog efficiency - a longitudinal study. Appl Anim Behav Sci 64: 269–288. https://www.sciencedirect.com/science/article/pii/S0168159199000386
 Svobodova I, Vapenik P, Pinc L, Bartos L (2008) Testing German shepherd puppies to assess their chances of certification. Appl Anim Behav Sci 113: 139–149. https://www.sciencedirect.com/science/article/abs/pii/S0168159107003000
 Duffy DL & JA Serpell 2012 Predictive validity of a method for evaluating temperament in young guide and service dogs. App. Anim. Behav. Sci. 138: 99-109. https://linkinghub.elsevier.com/retrieve/pii/S0168159112000433
 Serpell JA, Duffy DL. Aspects of Juvenile and Adolescent Environment Predict Aggression and Fear in 12-Month-Old Guide Dogs. Front Vet Sci. 2016;3:49. Published 2016 Jun 22. doi:10.3389/fvets.2016.00049 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4916180/
 In 2003, the University of Pennsylvania developed a behavioral test called C-BARQ, which measures aggression, fearfulness, and a few other behavioral problems in dogs. C-BARQ has become a standard for certain behavrioal studies and U Penn has a database with over 50,000 test results. The use of C-BARQ makes it easier to compare results among studies that use it, however, it is limited in its scope and doesn’t cover a number of qualities a breeder may want to be able to evaluate in puppies or adults. http://vetapps.vet.upenn.edu/cbarq/