Khoja Genetics & Deep Ancestry

From Khoja Wiki

What DNA testing may be revealing about our distant Khoja past


Editor's Note:

In the Satpanth Ismaili ginan Jannatpuri, Sayyid Imam Shah recounts how his grandfather, Pir Sadardin, brought many Hindu Lohanas into the fold of Islam and bestowed upon them the honorific title Khwaja, from which the word Khoja is believed to derive. It is a story deeply woven into communal Khoja memory.

But can modern genetics tell us anything meaningful about that deeper history?

To explore that question, Khojawiki spoke with Dr. Alshad S. Lalani, a US- and Canadian-trained biomedical scientist, who recently undertook a close examination of his own DNA, alongside a small number of publicly available Khoja samples, to better understand what genetics might reveal about Khoja origins.

With so little written about Khoja genetics, this exploration offers a fascinating new perspective on Khoja history, and perhaps a starting point for future Khoja scholars to investigate these questions more deeply.

Iqbal I. Dewji, Editor, Khojawiki.org


Q&A with Dr. Alshad S. Lalani


Tell us a little more about your background and why you decided to explore your genetic ancestry?

Ethnically speaking, I identify myself as a Canadian and American Ismaili of Khoja descent. Like many Khojas, my grandparents and great-grandparents hailed from the Kathiawar (Saurashtra) region of Gujarat in north-western India and later migrated to East Africa in the early twentieth century.

Professionally, I am a Biomedical Scientist with a doctorate in Biochemistry, and I specialize in the field of Precision Medicine and Cancer Drug Development in the biotechnology and pharmaceutical industry. So I clearly understand the power of genetics, but also the limitations of today’s consumer genetic tests for researching family history and health—especially for South Asians, who until recently have been underrepresented in whole-genome sequencing studies and public reference datasets.

Very little has been written about the genetics of Khojas in the scientific literature. I wanted to do a deep dive into my own DNA and determine whether it matched oral historical assertions, namely, that my distant Khoja ancestors were once Lohanas, a Hindu mercantile caste from the Sindh region of present-day Pakistan, who gradually converted to Nizari Ismailism nearly six hundred years ago and progressively migrated south into Kutch, Kathiawar, and Gujarat. I was also curious about how the average Khoja genetic profile fits within the broader South Asian landscape.


What have genetic studies taught us about the ancestry and origins of people from the Indian sub-continent?

Over the past decade, several large ancient DNA and archaeogenetics studies have clarified that most South Asians today derive ancestry from three deep ancestral populations, which mixed at different times and in different proportions across the subcontinent over thousands of years.

The oldest layer consists of Indigenous South Asian hunter-gatherers, often referred to as Ancient Ancestral South Indians (AASI). These ancient populations lived in the subcontinent for tens of thousands of years before farming began and descended from some of the earliest modern humans to reach South Asia, arriving from Africa roughly 50,000–60,000 years ago. Although we still know relatively little about them genetically, they contributed a deep foundational layer to all later South Asian populations.

A second major component derives from Iran-related Neolithic agriculturalists, whose ancestry is associated with early farming populations of the Iranian plateau. Groups carrying this ancestry moved from the Zagros region of modern Iran and crossed the Iranian plateau. They entered South Asia through Baluchistan and the Indus region between roughly 4700 and 3000 BCE, where they mixed with local AASI-related populations. This admixture became central to populations associated with the Indus Valley Civilization (IVC)—one of the world’s earliest urban societies, which built great cities such as Harappa and Mohenjo-Daro—and forms the dominant backbone of Khoja ancestry, as well as that of many communities long rooted in the Indus basin and surrounding regions of present-day Pakistan and north-western India.

Sometime after the decline of the Harappan civilization, a third ancestry component entered the region: Eurasian Steppe pastoralist ancestry. These populations ultimately trace their origins to the Pontic–Caspian steppes, with intermediary cultures such as Sintashta and Andronovo. From Central Asia, Steppe-derived pastoralists moved southward along the Inner Asian Mountain Corridor, crossed the Hindu Kush, and entered the subcontinent primarily through the Swat Valley into Punjab and the upper Indus basin, between roughly 2000 and 1500 BCE. These migrations are widely regarded as the most likely vector for the spread of early Vedic culture and Indo-Aryan languages into the subcontinent. Over time, these languages developed into Sanskrit and later into regional languages such as Sindhi, Kutchi, Urdu, Hindi, and Gujarati. Steppe ancestry is present across South Asia but is generally most pronounced in northern regions of present-day Pakistan and north-western India.

Simplified model of population interactions shaping South Asian ancestry, showing the mixing of indigenous South Asian hunter-gatherers, Iran-related agriculturalists, and later Steppe pastoralist ancestry. Adapted from Narasimhan et al, The Genomic Formation of South and Central Asia, https://doi.org/10.1101/292581

The interaction of these three components produced two broad genetic poles described in the literature as Ancestral South Indian (ASI) and Ancestral North Indian (ANI). These are not discrete populations but rather the ends of a genetic continuum. These deep ancestry patterns long predate the later formation of caste and community identities. ANI-related ancestry, enriched in Steppe and Iran-related components, is more prominent in today’s north-western Indian and Pakistani populations, including groups from Sindh, Punjab, Rajasthan, and north-western Gujarat. Populations such as Tamils, with much higher proportions of South Asian hunter-gatherer ancestry, tend to lie toward the ASI end of the continuum.


So based on these ancestral components, what does the average Khoja genetic profile look like?

Based on the limited number of Khoja samples I have analyzed, together with my own DNA, the average Khoja genetic profile is best described as Indus-centered, moderately Steppe-enriched, and moderately Indigenous South Asian.

"the average Khoja genetic profile is best described as Indus-centered, moderately Steppe-enriched, and moderately Indigenous South Asian

The average Khoja admixture profile can be visualized approximately as follows:

Khojas display a strong predominance of Indus-derived ancestry (65–70%), indicating deep genetic affinity with populations shaped during and after the Harappan Civilization. They also carry a clear but intermediate level of West Eurasian Steppe-related ancestry—not as high as in some Punjabi Jats, Rajputs, Brahmin, or Kashmiri groups, but higher than in many Gujarati, central Indian, or South Indian populations. The presence of a moderate AASI-related component reflects deep Indigenous South Asian ancestry, though typically at lower levels than those observed in many western and central Indian populations and far lower than in South Indian groups.

Overall, these proportions place Khojas toward the middle-to-upper range of the Ancestral North Indian genetic cline, clustering most closely with Sindhi and other mercantile populations long rooted in the Indus-Sindh region and adjoining areas of present-day Pakistan and north-western India.


What are some of these modern populations and groups that Khojas cluster with on genetic charts?

The clearest finding from my analysis is that Khojas share their closest genetic similarity with Lohana and Memon populations. Across multiple analytical approaches, including PCA charting, genetic distance comparisons, and admixture modeling, the average Khoja DNA profile appears nearly indistinguishable from many Lohana and Memon samples, forming a tight internal cluster with little visible genetic boundary between them. Genetic clustering reflects shared ancestry patterns and cannot by itself identify caste identity, religion, or specific historical events. Within that limitation, the genetic evidence aligns well with longstanding oral traditions that place Khoja origins among Lohana communities deeply rooted in the Indus–Punjab frontier regions, where early conversions are believed to have taken place.

"the genetic evidence aligns well with longstanding oral traditions that place Khoja origins among Lohana communities deeply rooted in the Indus–Punjab frontier regions”

Immediately surrounding this cluster are Sindhi populations, both Hindu and Muslim, which consistently appear very close to Khojas in genetic space. Bhatia populations also cluster nearby and sometimes overlap with Khoja and Lohana points, reflecting their shared role in the historical mercantile networks of western India and Sindh and, in some cases, gradual incorporation into the Nizari Ismaili (Khoja) community. Khojas also show strong proximity to Punjabi Khatri and Arora populations, which frequently appear among the nearest neighboring groups in PCA plots and genetic distance rankings. This similarity does not imply shared caste identity or direct descent, but instead reflects broadly shared ancestral origins and historical interaction within the wider Indus mercantile world.

Historians such as Claude Markovits have observed that Bhatias, Lohanas, and Khatris formed closely interconnected merchant communities across northwestern India and Sindh, linked through overlapping trade networks and occasional intermarriage. In my view, however, the genetic closeness observed among these groups is best understood primarily as the result of a common regional ancestry base—one that likely predates the later crystallization of caste identities—rather than large-scale genetic mixing between the communities in recent centuries.

Further along this genetic cline, at somewhat greater distance, are agrarian populations such as Arain, Awan, and Kamboj, which tend to be slightly more Steppe-shifted. Gujarati populations, meanwhile, are quite heterogeneous: certain Gujarati Muslim groups appear relatively close to Khojas, while many Gujarati populations, such as Vaniyas, Patels, Bohras, and some Brahmin groups, tend to be more AASI-shifted.

Broadly, these patterns place Khojas within a north-western South Asian mercantile genetic cluster centered around Lohana, Memon, and Sindhi populations.


Your analysis suggests that Khojas are genetically closer to Sindhis than to most Gujarati populations. Why is that?

Yes, this was one of the more interesting and somewhat counterintuitive findings from my analysis. Despite their strong cultural association with Gujarat, Khojas tend to align genetically more closely with populations from the Indus basin, particularly Sindh and adjoining regions of present-day Pakistan and north-western India, than with many Gujarati groups. This does not mean Khojas are identical to Sindhis, but their overall ancestry profile, characterized by relatively high Indus-derived ancestry, moderate Steppe ancestry, and moderate AASI ancestry, pulls them toward the Indus-shifted end of the Ancestral North Indian (ANI) genetic continuum. Another factor relates to how Gujarati populations are represented in genetic reference panels. Many commonly used genetic datasets contain limited representation of Gujarat’s internal diversity and often include groups with somewhat higher AASI ancestry and lower Steppe ancestry, which places them farther from Khojas in genetic space.

“the core genetic profile of Khojas was largely ‘baked-in’ before the community’s migration into Kutch and Kathiawar, which occurred only within the past 300–400 years”

Collectively, these patterns suggest that the core genetic profile of Khojas was largely established or “baked-in” before the community’s migration into Kutch and Kathiawar which occurred only within the past 300–400 years. Long-standing endogamy within the Khoja community would have preserved much of this underlying genetic structure even as the community later adopted Kutchi and Gujarati language and cultural identity.


Some Khojas look “Persian” in appearance. Do Khojas have Middle Eastern ancestry?

This is a common misconception. Available genetic evidence does not indicate meaningful recent Middle Eastern ancestry among most Khojas. Although the community’s Ismaili tariqah and religious history ultimately trace back to the Middle East, and Khojas historically participated in long-distance maritime trade across the Arabian Sea, these connections do not appear to have translated into substantial genetic input from Arab, Persian, or other Middle Eastern groups.

The perception that some Khojas appear “Persian” can instead be explained by much older shared ancestry. Like many populations of the Indus region, Khojas carry a significant ancient Iran-related (Zagros-associated) farmer ancestry component that entered South Asia thousands of years ago and became part of the genetic foundation of Indus-derived populations. This deep ancestry predates Islam, Persia, and modern ethnic identities by several millennia and remains widespread across the north-western subcontinent today. Occasionally, small traces of Middle Eastern–like signals may appear in certain admixture calculators, but these are generally very minor and often reflect modeling artifacts or distant historical contact rather than meaningful recent ancestry.

In short, what sometimes appears “Persian-like” is better understood as reflecting shared ancient ancestry across the broader Indus–Iranian plateau region, rather than recent Middle Eastern migration or descent.


Earlier you mentioned how ancient DNA samples are helping to understand the origins of South Asians. Which ancient populations do Khojas have the greatest genetic similarity with?

From the limited data reviewed, Khojas show close genetic similarity to ancient individuals excavated from protohistoric graves in the Swat Valley of present-day northern Pakistan, conventionally grouped by archaeologists under the so-called Gandhāra Grave Culture. These include populations associated with the Iron Age (c. 1200–800 BCE), as well as, in many cases, even closer genetic affinities to individuals from comparable archaeological contexts in later historical periods, including the Mauryan, Indo-Greek, and Saka–Parthian eras. By this time, the Swat Valley formed part of the wider Gandhāran cultural sphere, a well-documented region in ancient history linking the north-western Indian subcontinent with eastern Afghanistan and Central Asia. These communities are widely associated with early Indo-Aryan societies that emerged in the north-western subcontinent following the decline of the Indus Valley Civilization.

Protohistoric grave in the Swat Valley, Pakistan, linked to the Iron Age Gandhāra Grave Culture. Photo: Luca M. Olivieri.

From a genetic perspective, these populations already carried a mixed ancestry profile in which Indus-derived ancestry had already blended with Steppe-related ancestry. This genetic pattern later became common across the Indus basin, which is why many modern north-western Indian and Pakistani communities, including Sindhis, Khatris, Kamboj, Gujjars, and Kohistanis, also show strong affinity to these ancient samples. Importantly, such genetic proximity does not necessarily imply direct descent from a specific ancient burial population; rather, it reflects shared ancestry from broader regional populations that lived thousands of years earlier.

Khojas also show strong similarity to genomes of medieval individuals excavated from Islamic-period burials in the Swat Valley, particularly at sites such as Barikot and Udegram (11th–14th centuries CE). These individuals appear genetically very similar to earlier Iron Age populations from the same region, pointing to a high degree of genetic continuity over time.

Overall, these findings suggest that the deeper ancestry of Khojas is closely connected to ancient populations of the Swat Valley and the wider Gandhāran sphere within the Indus–Sindh world. This landscape, historically associated with the Sapta Sindhu (“Land of the Seven Rivers”) described in early Vedic literature, is closely linked to early Indo-Aryan–speaking societies and bridges the Indus Valley with many modern communities across present-day Pakistan and north-western India from today. While archaeogenetics remains a rapidly evolving field and much more remains to be discovered, the available evidence points to a notable degree of long-term genetic continuity. Broadly, the core genetic framework observed in Khojas today was likely established many centuries—perhaps millennia—ago.


What comes next for research into Khoja ancestry and genetics?

Most of the analysis discussed here is based on my own autosomal DNA testing together with a handful of Khoja genetic profiles that I was able to gather from publicly shared datasets online. These included admixture estimates and genetic coordinates that allowed for comparative distance analysis against both modern and ancient populations. Because the number of Khoja samples I examined was limited, the interpretations offered here should be viewed as exploratory and not definitive.

More rigorous population-level studies involving larger Khoja datasets would be needed to validate these patterns with greater certainty.

My hope is that this initial exploration sparks further curiosity within the Khoja community about our deeper origins. Ideally, it might also encourage more academically rigorous population-level studies of Khoja genetics in the future. Even a coordinated community effort, in which Khojas around the world voluntarily share their own DNA testing results and Global25 coordinates, could greatly expand the available dataset and help illuminate the genetic history of the Khoja community with better precision.

The story of our Khoja past is already written in our DNA—if we choose to learn how to read it.


Key Takeaways


Further Reading

Readers interested in the genetic history of South Asia and the deep ancestry of populations from the Indus region may find the following works useful starting points:

1. Narasimhan, V. et al. (2019). The Formation of Human Populations in South and Central Asia. Science 365(6457).

2. Shinde, V. et al. (2019). An Ancient Harappan Genome Lacks Ancestry from Steppe Pastoralists or Iranian Farmers. Cell 179(3): 729–735.e10.

3. Kerdoncuff, E. et al. (2025). 50,000 Years of Evolutionary History of India: Insights from ~2,700 Whole Genome Sequences. Cell 188(13): 3389–3404.e6.

4. Reich, D. (2018). Who We Are and How We Got Here: Ancient DNA and the New Science of the Human Past. New York: Oxford University Press.

5. Joseph, T. (2018). Early Indians: The Story of Our Ancestors and Where We Came From. New Delhi: Juggernaut Books.


Download the PDF version of the Q&A article here


Contact us with your comments here