In response to a private thread in the TT hall where the author was reinventing some fundamental ideas in statistics. I’d like to weigh in on the terminology he’s coming up with to explain his ideas about whether samples are representative of the populations they’re drawn from.
In English we say “under-representative” where he’s saying “hyper-representative”. For example, let’s say we’re comparing the German and Italian football teams. If the Muslim population in Italy is 20% but only 2/11 players are Muslim, then the Muslim demographic in the greater population is “under-represented” in the sample (the team). If the Muslim population in Germany is 5% but 1/11 players are Muslim, then the Muslims are over-represented.
By extension, the difference between populations—of Italian Muslims minus German Muslims—would be under-represented.
Population difference = Italian Muslims minus German Muslims = 20% – 5% = 15%
Sample difference = Italian Muslim players minus German Muslim players = 2/11 – 1/11 = 1/11 = 9%. 9% < 15%, therefore the difference is "under-represented.
A much less oblique example would be to compare Harvard admission rates (the samples) for Hispanics and Asians (the populations).
Someone who believes Harvard admissions ought to reflect the 99th percentiles of IQ and conscientiousness, traits in which Asians are higher than Hispanics on average, is going to say that Hispanics are over-represented and Asians are under-represented, and that the difference between the populations is under-represented. A ’60s-era progressive who believes that access to institutions is a capital good to be divided equally among interest groups will say that Hispanics are under-represented and Asians are over-represented. And, because the blank-slatist progressive believes there are no innate differences in the averages of the two general populations, any difference represented in the sample will be over-representing the (zero) difference between the populations.