Money Talks: The Power of Voice — Comment on William J. Mayew and Mohan Venkatachalam’s reply

Nemesysco’s claims are the real problem
Nemesysco’s LVA technology has no plausible scientific bases. This is not new but it is the real problem and it cannot be settled by arguing from correlations. It has to be addressed in principle, from a fundamental scientific bases of the technology. Regrettably, your work is being used in Nemesysco’s advertising, even though it is inconclusive and you did not mean to endorse Nemesysco’s software.

Clarifying an old issue
I have no intention to cast doubts about your integrity as academic researchers. Your effort in reviewing the literature supports the seriousness of your work, but I am convinced that you have spoiled the validity of your conclusions by using a technology that is simply irrelevant. Unfortunately, there is no way to remedy the data provided by Nemesysco’s technology. You must start from scratch using properly defined measures or persuade Nemesysco to demonstrate, once and for all, that their technology is valid (at least in principle). Until then, you and your research are the just among the recent victims of Nemesysco’s propaganda to sell its pseudo-scientific technology because you bypassed fundamental questions about working principles and focused instead on “results”, assuming that company’s programs are legitimate “black-boxes”. Fortunetellers are also “successful” and produce plausible results, but there is no reason to believe that their crystal balls have predictive powers. In Nemesysco’s case, the LVA-technology performs as a “voice-controlled random generator” that is vaguely sensitive to “flat” portions of the waveform – which can occur for a number of reasons independently of the speaker’s emotional status – giving the naïve impression of responding to some subtle aspect of the speaker’s voice. A successful “sham” produces not too “abnormal” results to begin with. It explores the customer’s beliefs and lets the victim do the actual interpretation job. Unfortunately, your open-minded attitude and lack of knowledge about the speech signal has dragged you into Nemesysco’s murky waters of confusing, half-true, arguments and both you and others are now posted in the company’s homepage as valuable bates that can potentially attract other victims. That implicit endorsement is actually at the core of the problem because your academic authority will be enough to lend credibility even to the aspects that you do not have the basic information or the competence to judge.
These programs are not cheap (and maybe they would be less convincing if they would cost less…), so this is big business and academics cannot stop the tide. However, since we are trained in scientific methodology, we have the responsibility of both helping the public in demanding scientifically grounded proofs of sellers claims and of adopting a careful and skeptical attitude. Science can, of course be wrong, but the burden of the proof is on the sellers. Until Nemesysco produces scientifically solid arguments demonstrating the validity of their technology, it does not make sense to keep on discussing results obtained with their software, and even less when those results come from poorly designed inconclusive studies.
It is relatively easy to come up with programs that are supposed to detect emotions in the voice of a speaker but to show that the results are valid is a totally different matter. It is necessary to explain the working principles and document with proper empirical bases expected significance, power and size effect. This demands falsifiable models, not just correlational observations and after the fact interpretations in thefortuneteller’s style. None of these components is addressed by the documentation posted in Nemesysco’s “research”; the company’s arguments sound more like a series of evasive shifts between “scientific research” and “private knowledge”, depending on convenience. Let that be the company’s problem. Until positive proofs of viability are produced it does not make sense to continue discussing the outcomes of a technology. For all that is known thus far, this LVA-technology is based on absurd principles right from the beginning.
Like investors or engineers, it is costly to test silly solutions to a problem and in fact it is not even practically possible to test all the possible silly alternatives. There is a necessary initial selection of plausible and interesting solutions that are not against general physical principles and it is not necessary to give all the absurd alternatives the benefit of the doubt. Although occasionally someone may come up with a novel solution that still is not incorporated in the established body of scientific knowledge, most ad hoc solutions are just silly or obviously impossible on the basis of fundamental principles. In case of a break through, it is always the logic of the proponents’ arguments that wins in a public and open debate. Charlatans, on the other hand, have reasons to conceal their secrets. They know they have no grounds for their claims and questioning their operating principles poses a potential threat to the business. This may or not be the case of Nemesysco’s claims but their published documentation strongly suggests that their solutions are likely not in the category of intelligent, emerging technologies for automatic detection of emotions in voice. There appears to be no independent motivation for their claims. Thus, so far those claims seem to belong to a category of “smart” commercial solutions that are scientifically implausible. Indeed, it is astonishing how an uncritical public prefers to believe on magic gadgets that are claimed to address extremely difficult problems rather than demanding proper answers about how that can be achieved without involving well-established principles of basic phonetics and signal processing theory.
To be sure, Nemesysco is not the only company using unsupported claims to market its products. Perhaps it is true ignorance, bad research or just the lack of ethical or moral that drives this sort of market along with an innocent attitude of an uneducated public. Maybe both the public and the companies truly believe on what they are claiming but academics have the responsibility of using their training in scientific methodology to stop such false beliefs before they get too far. In the case of Nemesysco, demanding solid argumentations or, otherwise, exposing their irrelevant LVA-technology is an important task because the company aims at selling their products to institutions and authorities who will be using public funds to buy Nemesysco’s products. This is indeed a serious public problem since the tax-payers money can actually be diverted to endless testing and evaluations of devices that should have been dismissed from the very beginning. Unless there is a sound fundamental principle to motivate the testing, it is simply a waste of resources to keep on evaluating Nemesysco’s products and discussing meaningless and mediocre results from ill-designed tests.

Nothing new, except vague allusions to additional secret knowledge…
What I am saying is indeed well known. My comments are of principled nature, and they might be addressed and refuted by other logical and signal-processing based arguments. But I am afraid that such arguments are more and more unlikely to exist. It appears that we are dealing with a bluff, wiht a life of its own and unfortunately there is not much that we can do about it. Meanwhile, just for the sake of clarification, let me address generically some of your comments in the reply to my previous blog and also posted in http://faculty.fuqua.duke.edu/~vmohan/bio/files/Lacerdaresponse.pdf.

Yes, of course I am still referring to the LVA-patent from 2003. My arguments are essentially a repetition of what I wrote earlier because there is nothing new and I am just reacting to Nemesysco’s own statements. Nemesysco claims that “All Nemesysco’s products and services are based on Layered Voice Analysis (LVA), our proprietary and patent protected voice analysis technology” (http://www.nemesysco.com/index.html, as posted on April, 5th, 2012). If we are to believe the company’s information, it is necessary to address the basis of the technology that is in fact documented in Nemesysco’s 2003 patent (Liberman, 2003).
Maybe it is true, as you wrote in your reply, that the LVA-patent only accounts for 5% of Nemesysco’s current technology but it is nevertheless the core of it, as the company writes. Incidentally, the names and the content of the variables that you report in your Journal of Finance paper (Mayew & Venkatachalam, 2011) are fully compatible with what is listed in the 2003 patent. This does not prove my point, of course, but it gives plausibility to the assumption that the original LVA idea still is the basis of the technology. Thus, from what we know, these initial 5% of Nemesysco’s technology perform a fundamentally irrelevant analysis of the speech wave. The loss of information is so dramatic that it is theoretically impossible for the remaining 95% to recover from the initial nonsense, unless the relevant information would be introduced in some way that actually does not have anything to do with the claimed LVA-techonology. So, Nemesysco has a dilemma to address:
Either LVA is actually the 5% basis of Nemesysco’s technology – in which case it is not possible for the remaining 95% of the technology to achieve any meaningful output by further processing of that input – or LVA is just a fake declaration of a component that in fact is not used at all because the technology in reality is based on meaningful phonetic analyses that Nemesysco uses without acknowledging.
In my opinion, the LVA-based results that you, and others, have reported are compatible with the irrelevance of the LVA-style analysis but for external observers it is impossible to determine if the LVA technology is actually used or if the company uses it as an empty cover just to make their product interesting. Obviously, there are far more alternatives of implementing silly processing algorithms than correct ones. Nevertheless, demonstrating that the technology works is Nemesysco’s problem, not ours, and so far we have no reason to believe that their LVA-technology produces any valid results. All we have is a series of empty claims and inconclusive data – like yours, unfortunately – in addition to clear demonstrations of the technology’s failure (Harnsberger, Hollien, Martin, & Hollien, 2009).
The only situation where the LVA-technology “works” is in the simulations of robot emotional expressions in dialog simulations. Of course, this is because there is no way to control the validity of a robot’s emotions and because the human speaker does the job of interpreting the random expressions of the animated robot face. However, this has nothing to do with the LVA-technology. It can be demonstrated by using other very simple and cheap random models, but using LVA-technology in this type of application or entertainment does not pose a scientific problem becasue there is nothing to validate. In contrast, there is a serious and real problem when the company claims that their LVA-technology can be used for emotion detection and applied for security, criminal or medical purposes because that can affect people and divert public funds, as I pointed out above. This is why I think that you and I, as researchers, have to be restrictive in implicitly supporting a technology based on principles that are scientifically irrelevant for the stated purposes and also masked by dubious claims of proprietary knowledge. Re-stating what I previously wrote in my critique of your work, academics have the responsibility of demanding convincing answers to fundamental scientific issues and if a company claims to have a valid solution, it must produce evidence that it stands on scientific grounds. This is what happens in pharmacological research and in other fields with implications for individuals and it would be inappropriate to relax those requirements in the field of voice analysis.

Perhaps a too generous benefit of the doubt
I understand and sympathize with your position of giving LVA-technology the benefit of the doubt and just testing it in the same explorative way that you would use in social sciences. However, there is a fundamental difference between the underlying unknown factors behind social or psychological processes. Whereas the complex interactions in the social or psychological processes are essentially unknown, in the case of LVA you are, at best, trying to reverse engineer a man-made product that should, from the beginning, have been accounted for in principle. As long as there are no constraining principles, the company can come up with whatever algorithm, change it from time to time and implement whatever ad hoc fixes to obtain reasonable results (if they know how to do it) or just to confuse those who try to understand their principles. The secrets behind the technology are most likely a bluff, hidden behind the (bad) excuse of not revealing them to the public for security reasons, for instance. It is interesting to notice that this secrecy is quite strange because either the technology is indeed as powerful in using the voice signal to detect the traces of involuntary brain activity that Nemesysco claims to be able to do – in which case it would be equally powerful even if the subject would know about it – or it is just a sham and the company has good reasons not to disclose the emperor’s new clothes. Their secrecy does not even make sense in a market and patent protection perspective because in case they would have a true technological principle, they would be interested in producing convincing arguments while at the same time understandably protecting their proprietary idea. As I wrote elsewhere, I see no reason to believe that Nemesysco has anything else to keep secret than the secret of having the public believing that there is a secret. This is why it is so important for the company that you and others lend your names and academic authority in support of their “cause”. Nemesysco does not even have to argue. They just post your research at their web page and you are left with the embarrassment.

Explaining some of the “meaningful” results
To avoid the waste of time of discussing the scientific plausibility of Nemesysco’s products (as far as I am concerned, there is none and it is not even meaningful to go on discussing it) let me try to convince you that the results you got can be explained by random processes that actually have nothing to do with the emotions that you are trying to capture using a LVA-based software.
I must clarify that by “voice-controlled random generator” I do not mean a process with two outcomes and an underlying 0.5 probability. As I wrote in my critique, the process is random in the sense that it is based on a waveform that is affected by all kinds of acoustic accidents but the underlying probability distribution is not uniform. There are clear biases, like in the case of the “plateaus”. As I tried to explain, these “plateaus” are in some cases vaguely related to silences or pauses, which implies that they will tend to pick up hesitations or lowered fundamental frequency portions, among all kinds of other acoustic garbage that they pick up due to their absurd processing principle. What you are observing in your results is probably just the spurious result of irrelevant information processing within which a small component is biased by those phonetic aspects among all the non-sense it generates. Your regression analysis captures, at best, just that bias. That is fully compatible with the overall meaningless results generated by the LVA-technology. You are doing the interpretation by analyzing the biased random results a posteriori and imputing them a meaning that they would hardly have if you were to use the LVA-data in an open, predictive way. I am afraid this does not prove anything and that you carry out the analyses using proper speech analysis technology. I suspect that the quality of your speech materials is quite poor from the beginning, maybe involving a relatively narrow frequency band and probably a poor signal-to-noise ratio, as it often is the case in phone calls. This is a problem for any scientific analysis of speech but not a problem for a program that does not even know what it is analyzing to begin with. Incidentally, this may be reason why you found such a good agreement between the NAFF measure and PRAAT’s jitter measure (Mayew & Venkatachalam, 2011, p. 50, Table 7) . Jitter is not a very meaningful measure because it simply captures the instability of signal and under noisy conditions it is affected as much by the background noise as it is by the true instability in vocal fold vibration that it is supposed to capture. It is not surprising that this is about the only measure that seems to match the LVA-results, although contrary to the LVA variable, jitter is in fact a well-defined measure.

Scientifically sound emotion analysis can be carried out, though not with LVA
I insist that you should correct your work by carrying out the analyses using scientific technology. Since, as you wrote, you are interested in assessing “The power of voice”, not “The power of LVA”, there is no other way of assessing the Power of voice than starting by using relevant scientific technology. I am sure phoneticians will be happy to assist you in the process and that you have all to gain from discontinuing your use of Nemesysco’s LVA-technology. Meanwhile, you may wish to take a look at Mark Liberman’s Language Blog where the aspects that you are interested in studying are also discussed (Linguistic Deception Detection: Part 1, http://languagelog.ldc.upenn.edu/nll/?p=3608).

I apologize for my slow reaction to your reply. I look forward to learn about a scientifically correct study of the research question that you are dealing with.
Good luck with the task.

References

  • Harnsberger, J. D., Hollien, H., Martin, C. A., & Hollien, K. A. (2009). Stress and Deception in Speech: Evaluating Layered Voice Analysis*. Journal of Forensic Sciences, 54(3), 642-650.
  • Liberman, A. (2003). US patent Patent No. 6,638,217 B1.
  •  Mayew, W. J., & Venkatachalam, M. (2011). The Power of Voice: Managerial Affective States and Future Firm Performance. Journal of Finance, Forthcoming.
  • Linguistic Deception Detection: Part 1, http://languagelog.ldc.upenn.edu/nll/?p=3608, in the Language Blog (Mark Liberman)

About Francisco Lacerda

Professor Member of The Royal Swedish Academy of Sciences
This entry was posted in Nemesysco and the LVA-technology and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s