Reassessing Genetic Links in Northeast India: Evidence vs. Narrative

By Bhogtoram Mawroh

When I saw the title "Khasis share a much deeper biological link with the Indo-Gangetic plains," I was greatly intrigued.

Upon a little searching, I found that this story is based on the 2026 paper "Admixture and Genetic Connectivity: Autosomal Insights into Indo-Aryan Speakers at the Eastern Edge of the Indian Subcontinent," written by a team led by Gyaneshwer Chaubey, a professor at Banaras Hindu University. The argument made in this paper is that the Indo-Aryan population in Assam, i.e., caste Assamese, has more genetic linkage with Indo-Aryan groups from the Gangetic plains like the Harijan and Kol groups, than the local nearby groups, i.e., Bengali (Indo-Aryan) and Nyishi (Tibeto-Burman). These revealed some very important insights into the history of Assam and the region.
Harijans are what we call Dalits and were traditionally known as "Untouchables." Now, the interesting thing about this group is that although its members now invariably speak the local Indo-Aryan language in the Gangetic plains, genetically they are more related to Dravidian speakers. There are already papers that reveal how Indo-Aryan ancestry (i.e., ancestry connected to the Steppe in Central Asia) is much higher among the upper castes, while South Asian ancestry (i.e., more Dravidian-related) is greater among the lower castes like the Dalits. So, when the Indo-Aryan people arrived in South Asia, they already had a rudimentary caste system, but there was some mixing with the local population. But around 2,000 years ago, this mixing largely ceased, leading to the highly endogamous caste system seen today, where people married within their own caste. 'Who We Are and How We Got Here (2018)' by David Reich is an incredible book for understanding the evolution of the genetic landscape of India and the ossification of the caste system. However, as the system hardened, it also created the danger of recessive genetic disorders becoming more common within a given group.
In hospitals in Tamil Nadu, before administering anesthesia, doctors would ask patients their caste. This is because it has been discovered that the Chettiyar caste, which is a Vaishya or trading caste, can be fatally allergic to anesthetic agents. This particular community also has an infertility problem. All of this has been connected to the practice of consanguineous marriage, where a boy can marry his mother's brother's daughter, his father's sister's daughter, or even his sister's daughter. This has led to the transmission of genetic disorders that have become common within the group, a pattern often seen in communities that marry within their own group.
While the caste system has created the danger of genetic disorders persisting within a population, it also froze the country's demographic legacy. In other words, it is possible to identify which genetic lineage a certain caste or group belongs to. No particular group can claim to be genetically pure, but the proportions of different lineages vary. In the case of Harijans and other lower castes, they have less Indo-Aryan ancestry. But as one goes higher up the caste ladder, the proportion of Indo-Aryan ancestry becomes more prominent. So, I am surprised that this group is being considered Indo-Aryan. But maybe this group was from a time when mixing between Indo-Aryan groups and the local population was still happening. The time period of around 2,000 years makes this possibility more plausible.
The other group, Kol, is an Austroasiatic-speaking indigenous tribal group. They are related to other Munda populations, and historically the term was used for non-Indo-Aryan groups found in the Chotanagpur plateau, where they are still found today. The term "Kola" is in fact mentioned in the Rig Veda, the oldest Hindu scripture, composed in the Punjab region between 1500 and 1000 BCE. This was the period when Indo-Aryan groups had arrived in India. This suggests that Austroasiatic groups were found as far as Punjab, where they came into contact with Indo-Aryan groups. The many Austroasiatic words that have been adopted into Sanskrit, is proof of interaction between the two groups. So, unless there is another caste group known as Kol (there is in fact such a caste) it is surprising that this group is considered Indo-Aryan. Yes, at one point in time, they were in the Gangetic plains, but they were not Indo-Aryan.
Now, coming back to the paper, it argues that Indo-Aryan-speaking groups in Assam have greater affinity with other Indo-Aryan groups in North India than with surrounding groups from the Northeast. That does not mean there was no admixture with local groups that have greater affinity with East and Southeast Asian communities. In fact, about 24% of the ancestry comes from the latter group. This appears to have taken place between 2,000-1,500 years ago. This mixing, however, is much more recent compared to similar mixing that happened much earlier. This is reflected in the passage: "Interestingly, this proportion closely resembles that of Austroasiatic (Mundari) populations, although the timing of admixture in the Assamese Indo-Aryan group is considerably later than that of the Austroasiatic populations."
There are a few things this passage says. First, a similar mixture happened among Austroasiatic people and the local population. But the group mentioned here is Munda, not Khasi. Secondly, this mixture is older. This claim comes from an earlier 2019 paper, "The Genetic Legacy of Continental-Scale Admixture in Indian Austroasiatic Speakers" by Kai Tätte, which is cited in the 2026 paper. When you read the 2019 paper, the admixture was not between the Munda and Indo-Aryans, but with Dravidian speakers from Kerala, and it occurred between 2,000-3,800 years ago. This date is important because it was around 4,000 years ago that the Khasi and Munda shared a last common ancestor, suggesting that this was the point at which the two groups became distinct from each other. This was also the same period when settlements began to develop around Lum Sohpetbneng, relying most probably on the stone tools produced in Myrkhan as discovered by Marco Mitri, a historian from NEHU. What is important to note is that Khasi are not mentioned in the paper, except in a graph for general representation. The story has conflated Khasi and Munda, who are genetically and linguistically related but had become distinct groups around 4,000 years ago.
For the authors of the Chaubey-led paper, there are also some important points to consider. The fact that the Indo-Aryan group that arrived in the Northeast is more similar to Kol (an Austroasiatic group like Munda and Khasi) and Harijan (who speak an Indo-Aryan language but are genetically more related to Dravidians) suggests that they were originally not Indo-Aryan, but rather groups that originally inhabited the Gangetic plains-i.e., Dravidian and Austroasiatic populations. This had already been indicated by linguistic and archaeological evidence, and now appears to be supported by genetic evidence as well, as seen in this paper. Over time, they adopted Indo-Aryan language and Vedic culture. This is the group that came to the Northeast and mixed with local populations there. This is what I can infer from the paper, which in many ways confirms what is already known. In the preface to Homiwell Lyngdoh's 1937 book 'Ka Niam Khasi', there was already inkling of how India was populated by different groups at various time periods and of the place of the Khasi within it. With slight modifications, that picture is now becoming clearer.
So, technically, the paper is not about the Khasi but about the Munda, with whom the Khasi shared a common ancestor around 4,000 years ago. An intriguing point it raises is that around 2,000-1,500 years ago, when the Indo-Aryans entered Assam, the Brahmaputra may have been known by a different name-perhaps Burlung Buthur or Doima, as it is still called today among the Bodo. Kamakhya was still Ka-Mei-Kha, a sacred site for the Khasi, and many sites dedicated to Hindu worship today must also have probably been sites of indigenous worship. One wonders what the Khasi called the Brahmaputra back then.
These are such fascinating questions, the answers to which I don't know if we will ever get. But the paper, "Admixture and Genetic Connectivity: Autosomal Insights into Indo-Aryan Speakers at the Eastern Edge of the Indian Subcontinent," does give occasion to imagine the past and to reflect on how history is shaped by contingencies. It's not possible to go back to the past, but understanding it can help us see how we got here and what might have been. And I am sure there is much more that we will learn in the near future. I, for one, am excited about it.
(The views expressed in the article are those of the author and do not reflect in any way his affiliation to any organisation or institution)