Thursday, November 12, 2020

Benford's Law - How mathematics can detect fraud!

UPDATE 11-11-2020 Hi all, this video is currently being shared in relation to the 2020 USA election. Benford's Law applies when the dataset is a form of geometic growth over several orders of magnitude, such as the lengths of rivers. So would Benford's Law apply to an election? Here is a great video by Matt Parker explaining why we would not expect the first digit of election returns to follow Benford's Law https://youtu.be/etx0k1nLn78 Some are currently citing the work of Walter Mebane as an example of Benford's law being applied to elections. Mebane's method is a test based on the second digits of election results. (Rather than the first digits). If you have heard of Benford's law being applied to elections, that is probably referring to Mebane's work. Mebane's own analysis of the 2020 election is now here http://www-personal.umich.edu/~wmebane/inapB.pdf and does not support evidence of fraud in the 2020 election. Note that Mebane says "The first-digit distribution has nothing whatsoever to do with any kind of election fraud." Mebane's test on the second digits of elections is not universally accepted. One paper that disagrees (or at least advises caution) is this paper (https://core.ac.uk/download/pdf/206427437.pdf) from the California Institute of Technology. The paper simulated an election and found the test to be unreliable: "labeling a free and fair vote as fraudulent 34% of the time and... labeling a fraudulent election as free and fair 60% of the time." Their conclusion was that, since Benford's Law proved to be unreliable in a simple simulation, you should be extra cautious about applying it to a complicated, real, election. For completeness, Mebane's rebuttal to the above paper is here https://pdfs.semanticscholar.org/e667/b8ad9f58992828ff820ddc8a005de754c5f5.pdf?_ga=2.67043500.340285523.1604599512-1614321896.1604599512 However, both papers agree that more research needs to be done to verify if Benford's law applies to elections. ---------- Benford's Law is a truly surprising fact about the frequency of numbers when studying data such as prices, populations, rivers, even street addresses. And if someone's accounts do not follow Benford's Law then they may be committing fraud! ----------- Here's a good article by Ted Hill: http://web.williams.edu/go/math/sjmiller/public_html/BrownClasses/197/benford/Hill_1st-dig.pdf Hill, TP. The First-Digit Phenomenon. American Scientist 86 (4), 358-363. (1998) Here's a list of papers http://www.benfordonline.net/list/chronological ----------- This law was first notice in 1881 by the astronomer Simon Newcomb, then again in 1938 by the physicist Frank Benford. They both noticed that the starting digits of a lot of real world statistics do not appear evenly but follow a logarithmic distribution, for example this would mean numbers that start with a 1 appear over 30% of the time. A quick appeal to intuition will show this is true for data that grows exponentially (geometrically). If something grows by some multiplication factor, you will soon see that the distribution is logarithmic, i.e. that numbers starting with a 1 appear 30% of the time. This explains the law for a lot of things that grow in this way, like prices and populations. Yet this law also appears in other types of growth, including factorials and Fibonacci numbers. However, remarkably, this law also describes what happens when you take data randomly from a variety of sources, such as you might do if you took numbers from a newspaper. Although this data comes from a variety of distributions, not just from exponential growth but many other distributions, yet still follows Benford's Law. Although Benford observed this fact in his original paper, it was not proven until 1995 by Ted Hill. However, you can still prove Benford's Law without knowing this. If we can assume Benford's Law exists, then it must be scale invariant, i.e. it would not matter which units we choose to make our measurements in - kilometres, miles, feet, centimetres or whatever. As I prove in this video, the only distribution that is scale invariant must be the logarithmic distribution. Hence Benford's Law is logarithmic. This proof was first put forward by Roger Pinkham in 1961. In 1992, Mark Nigrini wrote his PhD thesis on the detection of income tax invasion using Bedford's Law, and his ideas are applied in the detection of fraud. We see Benford's Law in observational data because real data can be a complex mix of many distributions and because it is the distribution achieved when data is repeatedly multiplied, divided, or raised to integer powers. And, once achieved, the distribution persists under further multiplication, division, and raising to integer powers.


Benford's Law - How mathematics can detect fraud!

No comments:

Post a Comment