Open lecture at The Royal Swedish Academy of Sciences.
For further information check the Academy’s event calendar at:
Open lecture at The Royal Swedish Academy of Sciences.
For further information check the Academy’s event calendar at:
…can be followed on the Economics Job Market Rumors thread posted at:
The following blog (posted by Ministry of Truth) presents an excellent discussion of why researchers from the world of finance may have been trapped by LVA’s pseudoscientific technology:
Nemesysco’s claims are the real problem
Nemesysco’s LVA technology has no plausible scientific bases. This is not new but it is the real problem and it cannot be settled by arguing from correlations. It has to be addressed in principle, from a fundamental scientific bases of the technology. Regrettably, your work is being used in Nemesysco’s advertising, even though it is inconclusive and you did not mean to endorse Nemesysco’s software.
Clarifying an old issue
I have no intention to cast doubts about your integrity as academic researchers. Your effort in reviewing the literature supports the seriousness of your work, but I am convinced that you have spoiled the validity of your conclusions by using a technology that is simply irrelevant. Unfortunately, there is no way to remedy the data provided by Nemesysco’s technology. You must start from scratch using properly defined measures or persuade Nemesysco to demonstrate, once and for all, that their technology is valid (at least in principle). Until then, you and your research are the just among the recent victims of Nemesysco’s propaganda to sell its pseudo-scientific technology because you bypassed fundamental questions about working principles and focused instead on “results”, assuming that company’s programs are legitimate “black-boxes”. Fortunetellers are also “successful” and produce plausible results, but there is no reason to believe that their crystal balls have predictive powers. In Nemesysco’s case, the LVA-technology performs as a “voice-controlled random generator” that is vaguely sensitive to “flat” portions of the waveform – which can occur for a number of reasons independently of the speaker’s emotional status – giving the naïve impression of responding to some subtle aspect of the speaker’s voice. A successful “sham” produces not too “abnormal” results to begin with. It explores the customer’s beliefs and lets the victim do the actual interpretation job. Unfortunately, your open-minded attitude and lack of knowledge about the speech signal has dragged you into Nemesysco’s murky waters of confusing, half-true, arguments and both you and others are now posted in the company’s homepage as valuable bates that can potentially attract other victims. That implicit endorsement is actually at the core of the problem because your academic authority will be enough to lend credibility even to the aspects that you do not have the basic information or the competence to judge.
These programs are not cheap (and maybe they would be less convincing if they would cost less…), so this is big business and academics cannot stop the tide. However, since we are trained in scientific methodology, we have the responsibility of both helping the public in demanding scientifically grounded proofs of sellers claims and of adopting a careful and skeptical attitude. Science can, of course be wrong, but the burden of the proof is on the sellers. Until Nemesysco produces scientifically solid arguments demonstrating the validity of their technology, it does not make sense to keep on discussing results obtained with their software, and even less when those results come from poorly designed inconclusive studies.
It is relatively easy to come up with programs that are supposed to detect emotions in the voice of a speaker but to show that the results are valid is a totally different matter. It is necessary to explain the working principles and document with proper empirical bases expected significance, power and size effect. This demands falsifiable models, not just correlational observations and after the fact interpretations in thefortuneteller’s style. None of these components is addressed by the documentation posted in Nemesysco’s “research”; the company’s arguments sound more like a series of evasive shifts between “scientific research” and “private knowledge”, depending on convenience. Let that be the company’s problem. Until positive proofs of viability are produced it does not make sense to continue discussing the outcomes of a technology. For all that is known thus far, this LVA-technology is based on absurd principles right from the beginning.
Like investors or engineers, it is costly to test silly solutions to a problem and in fact it is not even practically possible to test all the possible silly alternatives. There is a necessary initial selection of plausible and interesting solutions that are not against general physical principles and it is not necessary to give all the absurd alternatives the benefit of the doubt. Although occasionally someone may come up with a novel solution that still is not incorporated in the established body of scientific knowledge, most ad hoc solutions are just silly or obviously impossible on the basis of fundamental principles. In case of a break through, it is always the logic of the proponents’ arguments that wins in a public and open debate. Charlatans, on the other hand, have reasons to conceal their secrets. They know they have no grounds for their claims and questioning their operating principles poses a potential threat to the business. This may or not be the case of Nemesysco’s claims but their published documentation strongly suggests that their solutions are likely not in the category of intelligent, emerging technologies for automatic detection of emotions in voice. There appears to be no independent motivation for their claims. Thus, so far those claims seem to belong to a category of “smart” commercial solutions that are scientifically implausible. Indeed, it is astonishing how an uncritical public prefers to believe on magic gadgets that are claimed to address extremely difficult problems rather than demanding proper answers about how that can be achieved without involving well-established principles of basic phonetics and signal processing theory.
To be sure, Nemesysco is not the only company using unsupported claims to market its products. Perhaps it is true ignorance, bad research or just the lack of ethical or moral that drives this sort of market along with an innocent attitude of an uneducated public. Maybe both the public and the companies truly believe on what they are claiming but academics have the responsibility of using their training in scientific methodology to stop such false beliefs before they get too far. In the case of Nemesysco, demanding solid argumentations or, otherwise, exposing their irrelevant LVA-technology is an important task because the company aims at selling their products to institutions and authorities who will be using public funds to buy Nemesysco’s products. This is indeed a serious public problem since the tax-payers money can actually be diverted to endless testing and evaluations of devices that should have been dismissed from the very beginning. Unless there is a sound fundamental principle to motivate the testing, it is simply a waste of resources to keep on evaluating Nemesysco’s products and discussing meaningless and mediocre results from ill-designed tests.
Nothing new, except vague allusions to additional secret knowledge…
What I am saying is indeed well known. My comments are of principled nature, and they might be addressed and refuted by other logical and signal-processing based arguments. But I am afraid that such arguments are more and more unlikely to exist. It appears that we are dealing with a bluff, wiht a life of its own and unfortunately there is not much that we can do about it. Meanwhile, just for the sake of clarification, let me address generically some of your comments in the reply to my previous blog and also posted in http://faculty.fuqua.duke.edu/~vmohan/bio/files/Lacerdaresponse.pdf.
Yes, of course I am still referring to the LVA-patent from 2003. My arguments are essentially a repetition of what I wrote earlier because there is nothing new and I am just reacting to Nemesysco’s own statements. Nemesysco claims that “All Nemesysco’s products and services are based on Layered Voice Analysis (LVA), our proprietary and patent protected voice analysis technology” (http://www.nemesysco.com/index.html, as posted on April, 5th, 2012). If we are to believe the company’s information, it is necessary to address the basis of the technology that is in fact documented in Nemesysco’s 2003 patent (Liberman, 2003).
Maybe it is true, as you wrote in your reply, that the LVA-patent only accounts for 5% of Nemesysco’s current technology but it is nevertheless the core of it, as the company writes. Incidentally, the names and the content of the variables that you report in your Journal of Finance paper (Mayew & Venkatachalam, 2011) are fully compatible with what is listed in the 2003 patent. This does not prove my point, of course, but it gives plausibility to the assumption that the original LVA idea still is the basis of the technology. Thus, from what we know, these initial 5% of Nemesysco’s technology perform a fundamentally irrelevant analysis of the speech wave. The loss of information is so dramatic that it is theoretically impossible for the remaining 95% to recover from the initial nonsense, unless the relevant information would be introduced in some way that actually does not have anything to do with the claimed LVA-techonology. So, Nemesysco has a dilemma to address:
Either LVA is actually the 5% basis of Nemesysco’s technology – in which case it is not possible for the remaining 95% of the technology to achieve any meaningful output by further processing of that input – or LVA is just a fake declaration of a component that in fact is not used at all because the technology in reality is based on meaningful phonetic analyses that Nemesysco uses without acknowledging.
In my opinion, the LVA-based results that you, and others, have reported are compatible with the irrelevance of the LVA-style analysis but for external observers it is impossible to determine if the LVA technology is actually used or if the company uses it as an empty cover just to make their product interesting. Obviously, there are far more alternatives of implementing silly processing algorithms than correct ones. Nevertheless, demonstrating that the technology works is Nemesysco’s problem, not ours, and so far we have no reason to believe that their LVA-technology produces any valid results. All we have is a series of empty claims and inconclusive data – like yours, unfortunately – in addition to clear demonstrations of the technology’s failure (Harnsberger, Hollien, Martin, & Hollien, 2009).
The only situation where the LVA-technology “works” is in the simulations of robot emotional expressions in dialog simulations. Of course, this is because there is no way to control the validity of a robot’s emotions and because the human speaker does the job of interpreting the random expressions of the animated robot face. However, this has nothing to do with the LVA-technology. It can be demonstrated by using other very simple and cheap random models, but using LVA-technology in this type of application or entertainment does not pose a scientific problem becasue there is nothing to validate. In contrast, there is a serious and real problem when the company claims that their LVA-technology can be used for emotion detection and applied for security, criminal or medical purposes because that can affect people and divert public funds, as I pointed out above. This is why I think that you and I, as researchers, have to be restrictive in implicitly supporting a technology based on principles that are scientifically irrelevant for the stated purposes and also masked by dubious claims of proprietary knowledge. Re-stating what I previously wrote in my critique of your work, academics have the responsibility of demanding convincing answers to fundamental scientific issues and if a company claims to have a valid solution, it must produce evidence that it stands on scientific grounds. This is what happens in pharmacological research and in other fields with implications for individuals and it would be inappropriate to relax those requirements in the field of voice analysis.
Perhaps a too generous benefit of the doubt
I understand and sympathize with your position of giving LVA-technology the benefit of the doubt and just testing it in the same explorative way that you would use in social sciences. However, there is a fundamental difference between the underlying unknown factors behind social or psychological processes. Whereas the complex interactions in the social or psychological processes are essentially unknown, in the case of LVA you are, at best, trying to reverse engineer a man-made product that should, from the beginning, have been accounted for in principle. As long as there are no constraining principles, the company can come up with whatever algorithm, change it from time to time and implement whatever ad hoc fixes to obtain reasonable results (if they know how to do it) or just to confuse those who try to understand their principles. The secrets behind the technology are most likely a bluff, hidden behind the (bad) excuse of not revealing them to the public for security reasons, for instance. It is interesting to notice that this secrecy is quite strange because either the technology is indeed as powerful in using the voice signal to detect the traces of involuntary brain activity that Nemesysco claims to be able to do – in which case it would be equally powerful even if the subject would know about it – or it is just a sham and the company has good reasons not to disclose the emperor’s new clothes. Their secrecy does not even make sense in a market and patent protection perspective because in case they would have a true technological principle, they would be interested in producing convincing arguments while at the same time understandably protecting their proprietary idea. As I wrote elsewhere, I see no reason to believe that Nemesysco has anything else to keep secret than the secret of having the public believing that there is a secret. This is why it is so important for the company that you and others lend your names and academic authority in support of their “cause”. Nemesysco does not even have to argue. They just post your research at their web page and you are left with the embarrassment.
Explaining some of the “meaningful” results
To avoid the waste of time of discussing the scientific plausibility of Nemesysco’s products (as far as I am concerned, there is none and it is not even meaningful to go on discussing it) let me try to convince you that the results you got can be explained by random processes that actually have nothing to do with the emotions that you are trying to capture using a LVA-based software.
I must clarify that by “voice-controlled random generator” I do not mean a process with two outcomes and an underlying 0.5 probability. As I wrote in my critique, the process is random in the sense that it is based on a waveform that is affected by all kinds of acoustic accidents but the underlying probability distribution is not uniform. There are clear biases, like in the case of the “plateaus”. As I tried to explain, these “plateaus” are in some cases vaguely related to silences or pauses, which implies that they will tend to pick up hesitations or lowered fundamental frequency portions, among all kinds of other acoustic garbage that they pick up due to their absurd processing principle. What you are observing in your results is probably just the spurious result of irrelevant information processing within which a small component is biased by those phonetic aspects among all the non-sense it generates. Your regression analysis captures, at best, just that bias. That is fully compatible with the overall meaningless results generated by the LVA-technology. You are doing the interpretation by analyzing the biased random results a posteriori and imputing them a meaning that they would hardly have if you were to use the LVA-data in an open, predictive way. I am afraid this does not prove anything and that you carry out the analyses using proper speech analysis technology. I suspect that the quality of your speech materials is quite poor from the beginning, maybe involving a relatively narrow frequency band and probably a poor signal-to-noise ratio, as it often is the case in phone calls. This is a problem for any scientific analysis of speech but not a problem for a program that does not even know what it is analyzing to begin with. Incidentally, this may be reason why you found such a good agreement between the NAFF measure and PRAAT’s jitter measure (Mayew & Venkatachalam, 2011, p. 50, Table 7) . Jitter is not a very meaningful measure because it simply captures the instability of signal and under noisy conditions it is affected as much by the background noise as it is by the true instability in vocal fold vibration that it is supposed to capture. It is not surprising that this is about the only measure that seems to match the LVA-results, although contrary to the LVA variable, jitter is in fact a well-defined measure.
Scientifically sound emotion analysis can be carried out, though not with LVA
I insist that you should correct your work by carrying out the analyses using scientific technology. Since, as you wrote, you are interested in assessing “The power of voice”, not “The power of LVA”, there is no other way of assessing the Power of voice than starting by using relevant scientific technology. I am sure phoneticians will be happy to assist you in the process and that you have all to gain from discontinuing your use of Nemesysco’s LVA-technology. Meanwhile, you may wish to take a look at Mark Liberman’s Language Blog where the aspects that you are interested in studying are also discussed (Linguistic Deception Detection: Part 1, http://languagelog.ldc.upenn.edu/nll/?p=3608).
I apologize for my slow reaction to your reply. I look forward to learn about a scientifically correct study of the research question that you are dealing with.
Good luck with the task.
Hobson, Mayew and Venkatachalam’s paper, Analyzing Speech to Detect Financial Misreporting, is just the last of a series of flawed studies by Mayew and Venkatachalam. The authors insist in using Nemesysco’s LVA technology and give the impression that this is as an emerging speech analysis technology based on Nemesysco’s proprietary secrets. In reality the LVA principles were described in Liberman, A. (2003), US patent No. 6,638,217 B1. Although the ad hoc thresholds used in the program may have been changed since then, the patent unequivocally demonstrates that LVA principles are absurd and incapable of extracting any useful information from the waveforms it is supposed to analyze (see Lacerda (2012), “Money Talks: The Power of Voice – A critical review of Mayew and Ventachalam’s The Power of Voice: Managerial Affective States and Future Firm Performance”, for an explanation of why the LVA technology cannot work). Given the irrelevance of the LVA basic variables, further processing of the measures cannot produce any sensible results (unless information would be added during the processing, which is not the case). The so-called “proprietary secrets” are most likely no more than Nemesysco’s pseudo-scientific wording, circular arguments and incompetent descriptions of the speech production process that are supposed to lead naïve customers into the belief that there are indeed “proprietary secrets. Mayew and Venkatachalam have failed to see through the ungrounded claims of the LVA vendors. As a consequence, Mayew and Ventachalam have so far published a series of inconclusive papers because their results are critically dependent on the validity of LVA technology.
As academic researchers, Mayew and Venkatachalam’s uncritical acceptance of a commercial “black-box” is per se the kind of mistake that researchers should be trained to avoid. Unfortunately, Mayew and Venkatachalam’s assumption of LVA’s validity has surprisingly passed the review hurdles of several finance-oriented publications, conveying the bizarre impression that repeating series of void results may eventually turn them into valid results. However, whereas accommodation to repetition pressure may be an interesting psychological phenomenon and a driving force in the world of finance, where prices and product acceptance are affected by the number of interested customers, scientific explanations must be based on logical and grounded arguments that stand by themselves, independently of superficial customer opinions or market expectations. In summary, circular references do not increase the explanatory power or validity of a claim and any proper speech analysis must be grounded on scientific signal processing principles that stand public scrutiny, rather than on obscure “knowledge” of self-promoted “researchers” or sellers.
Obviously, Hobson, Mayew and Venkatachalam lack competence in the areas of speech analysis and signal processing but they should have been skeptical and demanded convincing answers to pertinent questions about the working principles of the LVA technique. They should also be aware that although correlations may be a useful exploratory tool in epidemiological studies, correlations, even if significant, do not by themselves prove or demonstrate anything unless supported by plausible and adequate models. In the case of an ad hoc constructed algorithm, like LVA, it does not even make sense to explore the algorithm’s validity using correlations. Validity must be demonstrated by the manufacturers’ principled arguments, not by the sort of inconclusive statistical fishing expedition that Mayew and Ventachalam appear to have engaged into. However, because the LVA technology is not supported by any reasonable or even plausible principle, the series of self-references in methodologically flawed papers and the excuse of “proprietary secrets” only fuel the notion that LVA technology is no more than just a sham. Thus, unless Mayew and Ventachalam (or anyone else selling or using LVA-based technology) can independently demonstrate that the processing carried out by LVA provides meaningful emotional information, the technology must be seen as an irrelevant “digital dowsing rod” and it makes of no sense to discuss its results or their implications at this point.
Of course, Nemesysco is not the only company trying to sell “advanced” technology based on pseudo-scientific analyses of speech. There seems to be an expanding market for bogus analyses backed up by aggressive propaganda exploring fiction and myths that may be appealing for a public lacking basic competence in speech processing and phonetics. This can only be stopped by an educated cautious and skeptical public that demands principled evidence for the seller’s claims. Scientists cannot expose all the pseudo-scientific nonsense in society but they have the responsibility of at least not endorsing it, like Mayew and Ventachalam did by adopting a “black-box” that there is virtually no independent reason to believe in. They should be encouraged to perform a scientific reanalysis of all their speech materials. A proper analysis of the speech materials may or not corroborate the authors’ hypothesis but it will, above all, produce results that are worth considering. By insisting to use a technology that they do not grasp and for which there are no scientific grounds, Mayew and Ventachalam are contributing to a mockery of financial research and, by extension, undermining serious science.
Lacerda (2012), “Money Talks: The Power of Voice – A critical review of Mayew and Ventachalam’s The Power of Voice: Managerial Affective States and Future Firm Performance”, http://urn.kb.se/resolve?urn=urn:nbn:se:su:diva-74478
Liberman, A. (2003). US patent Patent No. 6,638,217 B1. http://www.freepatentsonline.com/6638217.pdf
Department of Linguistics, Stockholm University http://www.ling.su.se
Comment by Francisco Lacerda — March 13, 2012 @ 1:58 pm
(This comment was initially posted at http://blogs.law.harvard.edu/corpgov/2012/01/13/analyzing-speech-to-detect-financial-misreporting/, as shown above, but deleted by the moderator about 1 hour later)
The Department of Work and Pensions (DWP) dropped last week their interest on the Voice Risk Analysis (VRA), based on Nemesysco’s Layerd Voice Analysis (LVA) technology. After extensive tests of Voice Risk Analysis the DWP stated that ”From our findings we cannot conclude that VRA works effectively and consistently in the benefits environment.”( http://www.dwp.gov.uk/local-authority-staff/housing-benefit/security/voice-risk-analysis-vra/).
Of course, the DWP’s conclusion that VRA does not work comes as no surprise and could have been predicted from the mere analysis of the working principles of the LVA-technology described in Nemesysco’s patent: Inferring a speaker’s emotional state from counts based on the number of “thorns” and “plateaus” observed in sample triplets of a sound wave digitized at 11025 samples/second (i.e. sample triplets covering about 272 µs) simply does not make any sense. It is as absurd as scanning a text by counting the number of times that a vowel occurred in between two consonants (“thorns”), counting the number and length of sequences that can be formed by triplets of consecutive characters not further away from each other than, for instance, five steps in the alphabetic sequence (“plateaus”) and using these counts to issue statements on the author’s emotional state. Thus, although the tests were unnecessary, the fact that DWP’s thorough analysis led to the inevitable conclusion that the technology did not work presents yet another strong empirical result in line with the notion that Nemesysco’s LVA-technology lacks plausibility and can only generate unstable and irrelevant outcomes.
Two documents relating to the ongoing debate on Nemesysco’s technology and its validity:
The first is a contribution to the discussion of technical aspects behind the technology, “LVA-technology: A short analysis of a lie”. I discuss the shortcomings of the technology on which Nemesysco’s “emotion detection devices” (lie-detectors) are based. This is not a complete analysis but it exposes the essential problems. In a forthcoming development I will assess in more detail the impact of spurious acoustic background activity.
The second document is Nemesysco’s official standpoint regarding the critique that Anders Eriksson and I have directed to their systems. Anders Eriksson and I present the original document along with our comments on Nemesysco’s official response, displayed as sticky notes. (Additional layout of the same document with notes listed on separate pages).
Additional links to news on this issue
The Guardian, UK, 12 March 2009, on the evaluation of "Voice Risk Analysis" (English)http://www.guardian.co.uk/technology/blog/2009/mar/12/voice-risk-analysis-lie-detection-benefits-government-results?commentpage=1
The Guardian, UK, 12 March 2009 (English)
Sveriges Radio, P1, Sweden, 12 March 2009 (Swedish)
BBC, UK, 12 March 2009 (English)
Expresso, Portugal, 10 March 2009 (Portuguese)
Swedish Research Council, 21 February 2009 (Swedish)
Swedish Research Council, 23 February 2009 (English)
Combating fraud with fiction
To combat fraud, authorities and insurance companies in the UK invested public funds in voice-based lie-detectors to discourage the public from entering false claims[i]. The deterrent effect was significant and there were substantial savings as the number of accepted claims dropped dramatically[ii]. Yet a pertinent ethical problem is that these “lie-detectors” are based on Nemesysco’s[iii] LVA-technology and there is no scientific assessment of the technology that suggests it works. The technology does not extract relevant information from the speech signal but even if it would perform correct phonetic analysis the relevance of such measurements to correctly assess the speaker’s emotional state still is at issue. Indeed the LVA-technology claims to assess the speaker’s state of mind exploring the minute traces that it leaves in speech waveform but an analysis of the LVA patent[iv] indicates that the technology falls miserably short of its suggestion of high precision analysis. All it does is counting local maxima and minima (“thorns”) within a three-samples’ running window, and a simple statistic over “plateaus” in the waveform, performed on a crudely digitized (11.025 kHz, 8-bit/sample) speech signal. In fact, the crudeness of the amplitude coding is even worse than 8-bit/sample because the signal is further “filtered” in yet another quantization step that ends up representing the amplitudes in only 85 levels (as compared to the already poor 256 levels of the 8-bit representations). There is no rationale for why counting just such thorns and plateaus, where the amplitude and time information is lost, would be meaningful and no logic principle is provided for why the subsequent operations and the thresholds involved in them would possibly lead to any valid estimate of the speaker’s mental state. The system’s estimate of the speaker’s mental state is difficult to predict because it is based on the unstable thorns and plateaus in an acoustic wave which are influenced by room acoustics, noise or anything that changes the number of thorns and plateaus. Under optimal circumstances these measures would indeed describe gross average characteristics of the speech wave but they are so crude that for any given count of thorns and plateaus there is a vast family of curves that would be interpreted by the LVA-technology as being exactly the same, although many of them would not even resemble a speech signal. This is an immediate consequence of the low information content of the analysis. It simply cannot distinguish the signals from each other. This is probably also the reason why the LVA-systems are perceived as being robust. Since they rely on highly noisy and crude measures, it is difficult to distinguish anything at all, so changes in the background noise or other spurious acoustic accidents go simply unnoticed. It is like trying to seeing the world through greasy glasses. For both the speaker and the tester this erratic behavior may easily give the impression that the system picked up something “deep” that not even the speaker knows about. Finally, “certified” personal issues the final interpretation of these “complex” instrumental results. That does not make things better. Unless the certified personal has some independent basis for the judgment, analyzing a non-valid output is simply irrelevant.
Why were not relevant questions asked from the beginning?
An intriguing aspect of all this is how come such a “technology” could be adopted by some of the British authorities. Why were not the highly competent speech scientists in the UK asked to look at this amazing technology? It would have been enough to ask one of my first grade students of Phonetics, I believe. Did not the responsible authorities suspect that Nemesysco’s promises were “too good to be true”? Is it a calculated risk of using the technology’s deterrent effect, as long as it’s lack of basis is not denounced to the public? I find it hard to believe that authorities would engage in undermining the public’s respect by engaging in such practices and it simply doesn’t make sense for me. Isn’t it predictable that someone would eventually point out the hoax? Did anyone believe that it would be possible to choke scientists’ freedom of speech or were there a hope that no one would even bother to address the issue?
I have no answer to these questions but one thing is for sure: Nemesysco’s sellers must be extremely convincing and well organized to have succeeded in this way. But of course there must also be a wide range of people willing to listen and being convinced by their arguments. From the buyer’s short-term perspective it may be easy to think that, as long as we fool people that are naïve enough to believe in this hoax, no harm is done, but isn’t it obvious that this will backfire and has the potential of eventually affect even those who are not fooled into the false belief?
Ungrounded scientific claims cannot be left unchallenged
Unfortunately scientists may have contributed to the “success” of the LVA-technology by understandably refusing to study its arbitrary principles, thereby leaving the public scene to be taken by Nemesysco’s propaganda. A pedagogic scientific effort may be necessary to explain for the public why the LVA-technology cannot work. Now Nemesysco’s resources have grown so large that they could even force the withdrawal of a peer-reviewed paper questioning its technology, rather than engaging in the scientific debate[v]. Nemesysco’s official excuse was that the paper is defamatory because we use the word “charlatanry” but I believe this is wrong. The word is used in a general sense and to be a charlatan the person is supposed to know that the product actually does not work, but we do not imply that the inventor actually knows that. On the contrary, we rely on a published interview with the inventor where he says he has no formal academic competence in speech processing and we draw the conclusion that he may indeed not have been aware of the lack of scientific basis of the method he proposes. To be unaware of LVA-technology’s fundamental problems is okay (perhaps naïve) before the publication of our paper. Perhaps a more plausible reason for Nemesysco’s action was that our paper was damaging their business, as also stated in their lawyer’s first letter. Indeed, rather than discussing percentages of correct responses (which any random system obviously generates) our paper addressed the validity of the method, not its reliability, and the only way of discussing validity is to argue convincingly in support of the technology’s fundamental principles. I believe there are none, so the next “best” option was to shoot the messenger so that the news does not spread…
The validity of LVA-technology remains unproven
So now it is known, unless someone proves that we are wrong, that LVA-technology does not live up to the claims of detecting a speaker’s emotional state using samples of her/his speech. The issue is of no scientific interest and not even in my main field of research but I happened to be curious about the principles of the LVA-technology and I do have the necessary background to address
the question, from both the speech-processing and the phonetic perspectives. As researcher, being paid by public funds, I also have the responsibility of denouncing that the emperor is naked. However, having said that, I have no illusions that my shouting will last long enough to prevent similar cases in the future. Even if for the moment such LVA-based devices may be removed from official uses, it is likely that the human fascination for “fantastic machines” along with the company’s effective propaganda and possible short-term benefits will soon override my efforts to inform the public. That is worrying but just a part of reality that I have to live with. These devices are not cheap and it will take much courage for the people who invested in them to recognize that they just wasted their money.
Recovering from a mistake
Meanwhile I would say that it is urgent that authorities, who publicly have praised these devices, take prompt and courageous action to admit that the investment was a mistake. When it becomes known that the LVA-technology does not produce relevant results the public’s confidence on the authorities will be deeply damaged. Professionally conducted structured interviews of (randomly) selected customers will do a far better and responsible job than Nemesysco “lie-detectors”. LVA-technology may be acceptable for entertainment, but not for serious applications influencing people’s lives.
Related links and references:
Science magazine: http://sciencenow.sciencemag.org/cgi/content/full/2009/210/1
Debate article (Swedish) :
‘Ministry of Truth’ – www.ministryoftruth.me.uk
[iv] US Patent 6,638,217 B1, Oct. 28, 2003 : http://www.freepatentsonline.com/6638217.html?query=PN%2F6638217+OR+6638217+B1&stemming=on
[v] Eriksson, A. and Lacerda, Francisco (2007). Charlatanry in forensic speech science: A problem to be taken seriously. International Journal of Speech, Language and the Law, 14, 169-193. http://www.equinoxjournals.com/ojs/index.php/IJSLL/article/view/3775
Karin Bojs skriver i dagens DN:s nätupplaga (http://www.dn.se/DNet/jsp/polopoly.jsp?d=597&a=868300) om Anders Erikssons och min artikel ”Charlatanry in forensic speech science: A problem to be taken seriously”. I vår artikel analyserar vi principerna på vilka Nemesisco:s (http://www.nemesysco.com) bygger sin ”Voice Analysis Technology” och vi finner att den redovisade metoden kan, helt enkelt, inte fungera. Med de mätningarna av talsignalen som metoden bygger på är det inte ens fråga om att den s.k. Layered Voice Analysis (LVA) skall kunna fungera i en viss procent av fallen! Det finns ingen som helst relation mellan de måtten som metoden bygger på och de slutsatserna som man vill dra av dessa mätningar. Det förklarade vi i vår artikel, och företaget gillade tydligen inte våra slutsatser.
Men att studera den godtyckliga metoden som företaget har använt sig av är naturligtvis ingen intressant forskningsfråga. Det krävs inte mera än rudimentära kunskaper i akustisk fonetik för att inse att det inte kan finnas någon relation mellan de egenskaperna hos talets akustiska signal som metoden bygger på och sanningshalten i det som sägs. Metoden bygger på beräkningar av ”vändningar” (thorns) och ”stabila områden” (plateaus) hos vågformen. Dessa detaljer hos vågformen påverkas kraftigt av rumsakustik och bakgrundsbuller. Ljudvågorna som kommer från en talares talapparat blandas med reflektioner från tidigare producerade ljudvågor som reflekteras av väggarna, golvet, taket och andra föremål i rummet och blandas dessutom med andra bakgrundsljud. Antalet vändningar och stabila områden som Nemesisco:s system har tillgång till är en fullständig blandning av alla slags ljudvågor, där ljudvågorna som talaren faktiskt producerade är bara en del av helheten. Situationen liknar ett försök att härleda simrörelserna hos ett enskilt barn genom att räkna antal toppar och dalar i en simbassängs vattenyta där samtidigt ett stort antal barn leker i vattnet. I teori skulle det kunna gå, om man hade tillräckligt detaljerad beskrivning av vattenytan, både i tid och rum, och dessutom en adekvat matematiskt modell av relationen mellan kroppsrörelserna och vågorna på vattenytan. Tyvärr finns i Nemesisco:s metod varken en modell eller tillräckligt mätnoggrannhet för att överhuvudtaget kunna relatera den analyserade vågformen med det som talaren har sagt. Metoden ter sig mera som en slags godtycklig beräkning à la Uppfinnare Jocke. Det är då inte konstigt att systemen fungerar ”lika bra” oavsett om det används på flygplatser, i pubar, ute i trafiken, osv. Politikerna i Storbritannien bör ha ställt sig frågan om hur det kan vara så att Nemesysco:s system klarar av ogynnsamma akustiska miljöer som deras egna röststyrda mobiltelefoner har så svårt att hantera… Skulle det kunna vara för att Nemesysco:s system inte klarar av någon alls? LVA duger, i bästa fall, för underhållning men att försöka presentera en sådan metod som grund för tillämpningar inom säkerhetstjänster, medicin, utbildning, osv. är mycket allvarligt, i synnerhet när inskaffningen av sådan ”teknologi” skall finansieras med skattemedel.
Ur forsknings perspektiv är studien av den s.k. LVA metoden i sig helt ointressant men som forskare är vi moraliskt skyldiga att ställa upp med vår kunskap för att avslöja något som, till bevis om motsatsen ter sig som ren bluff och som dessutom marknadsförs som något av samhälligt intresse. Den typ av marknadsföring måste naturligtvis kontras med kunskap hos potentiella köpare. Vi kan naturligtvis ha fel i vår analys, men den har redovisats på ett klart sätt som möjliggör för Nemesysco att bemöta våra synpunkter med vetenskapliga argument som motbevisar vår analys. Än så länge finns helt enkelt ingen vetenskaplig fråga att diskutera och metoden verkar inte ha någon som helst anknytning till det tillverkarna påstår att den gör och det finns ingen logisk princip bakom det hela. Utan en övertygande förklaring av själva principen är påståenden om ”träffsäkerheten” helt ovidkommande. Dessutom ger träffsäkerheten i sig ingen intressant information. Den måste alltid sättas i relation till antal falskalarm som är förenade med just denna träffsäkerhet. Om falskalarm går hand i hand med antal träffar är systemet inte pålitligt. Det krävs högt antal träffar samtidigt som falskalarms antal är lågt.
För mera information se också:
http://ling-map.ling.su.se/blog/ (Postat kl 17:09 den 11 december 2008 av Eva Lindström)
Hur skall visumreglerna utformas så att de inte hindrar Sveriges möjlighet att etablera och utveckla värdefulla forsknings kontakter?
En viktig fråga som diskuteras i DN, 2014-07-08:
Frikativor på Roslagsbanan
Att resa med Roslagsbanan i de nya, fräscha vagnarna har blivit en obehaglig fonetisk upplevelse för resenärerna. De automatiska utropen av kommande stationer matchar inte den stora satningen som gjordes på Roslagsbanan (http://t.sr.se/MHgk6F). Utropen spelas på alldeles för hög volym, låter hemsk skrälliga, som i en gammaldags raspig högtalare. Det är framförallt frikativor som drabbas av uppenbar distorsion varje gång rösten annonserar “näXta…” eller uttalar något annat ord som innehåller en frikativa.
Att en professionell aktör levererar ett sådant undermåligt system är redan förvånande men att SL i sin tur accepterar leveransen tyder på en märklig kombination av inkompetens och nonchalans. Om det fanns grundläggande yrkeskunskaper bör det inte vara något större problem att prestera inspelningar utan distorsion eller att anpassa uppspelningsnivå till ljudsystemets dynamiska omfång. En gnutta av yrkesstolthet skulle dessutom ha åtgärdat felet så snart det upptäcktes i stället för att låta det stå kvar sedan i flera månader, trots upprepade påpekanden. I ett av sina svar hävdade SL att de prioriterar punktlighet och säkerhet. Gott så, men skall resenärerna tro att det är så många och så allvarliga säkerhets påbjud att SL inte klara av att åtgärda detta relativt enkla ljudproblem på sex månader? Kan det vara så svårt att kräva att de som gjorde ett undermåligt arbete från början skall göra om och göra rätt? Är det acceptabelt att SL som om resenärerna skall tvingas stå ut med det raspiga ljudet tills de vänjer sig vid eländet och resignerar?
Man får verkligen hoppas att detta med ljud skall vara en enstaka lacuna i SL:s kompetens. I sin senast replik på mina klagomål meddelade SL att den förskräckliga ljudkvalitén beror på högtalarna i de nya vagnarna – en förklaring som inte kan stämma eftersom de förinspelade uppropen låter lika dåligt (om än med lägre volym) när de spelas upp i de gamla vagnarna.
Tyvärr förstärker SL:s svar gång på gång intrycket av inkompetens och nonchalans i att ta itu med problemet. Det är uppenbart att SL struntar fullständigt i att åtgärda problemet så länge de tror att det bara är någon enstaka resenär som störs av eländet. Som om felet skulle bli mindre fel om det påpekas av en passagerare i stället för tusen.
Det är upprörande att detta ljudproblem, som bör varit lätt att upptäcka och åtgärda från början, fortfarande fortsätter vecka efter vecka. Någon har brustit i sitt ansvar vad gäller de nya vagnarnas ljudsystem och att SL, på ett halvår, inte har brytt sig om eller haft kapacitet att åtgärda problemet tyder på oacceptabel inkompetens och nonchalans. Lite kunskap om akustisk fonetik och inspelningsteknik hos SL:s granskare och företaget som gjorde inspelningarna och installerade ljudsystemet hade kunnat spara resenärerna flera månader av dagliga obehagliga ljudupplevelser!
Unfortunately, British authorities continue using “technology” supposed to carry out “voice risk analyses” (VRA):
It is amazing that public funds keep on being diverted to pseudo-scientific devices in the UK. It is good that “No one is going to be prosecuted for benefit fraud on the result of voice analysis tests alone” but stopping these pseudo analyses would save both money and the authorities’ credibility…
For a short discussion of this type of voice analyses (VRA and LVA, two different types of questionable working principles) see
Two weeks ago I had the pleasure of listening to Robert J. Lefkowitz‘s Nobel lecture at Stockholm University, in connection with the 2012’s Nobel Prize in Chemistry that he shared with Brian K. Kobilka.
Robert J. Lefkowitz is affiliated with the Howard Hughes Medical Institute and Duke University Medical Center. The authors of “The Power of Voice”, at the Fuqua School of Business, are also affiliated with Duke University but the contrast between the scientific methodology used in Lefkowitz’s studies of G-protein receptors and some finance researchers’ naive acceptance of Nemesysco‘s LVA obscure methodology could not be more striking. We can only hope that the finance researchers at the Fuqua School of Business will eventually be inspired by their colleagues at the Medical Center. Rather than just running correlation studies involving surrealistic variables proposed and controlled by Nemesysco sellers, they hopefully will start asking relevant fundamental methodological questions before engaging into ad hoc “analyses of speech to detect financial misreporting” that end up endorsing pseudoscience.
Mayew & Venkatachalam’s recent article on “The Power of Voice…” (Mayew & Venkatachalam, 2012) shows that academics are not immune to the fallacies of pseudoscience. This is a potential problem because academics may end up endorsing bogus technologies that, if spread, may affect peoples’ lives or at least deviate funding that should be applied to meaningful technologies.
We are currently attempting to investigate the phonetic conditions under which the emotion analysis technology used by Mayew & Venkatachalam generates “interpretable” results. A possible scenario is that the quasi-random character of such “emotion analyses”, in connection with ill-defined experimental conditions, offers results that can be forced into suitable interpretations.
A report from this study is expected to be ready for publication by September/October 2012.
Mayew, W. J., & Venkatachalam, M. (2012). The Power of Voice: Managerial Affective States and Future Firm Performance. The Journal of Finance, 67(1), 1-44.