Press "Enter" to skip to content

NPR: A statistical nightmare; inherent imperfection in data collection raises concern

Any data collection exercise like a census suffers from two kinds of errors. The first is the coverage error and the second, content error or the error in response or recording. (Representational Image)
  • By P C Mohanan

Taking a census of people is a favoured administrative measure of all governments. In ancient Rome, Augustus Caesar was fond of censuses and it is believed that Joseph and a pregnant Mary travelled to Bethlehem, to be enumerated for one such census and that Christ was born during this trip. Population census is now an essential tool for modern-day governance, though a few countries with excellent civil registration systems have discontinued periodical censuses. In countries like India, we not only have population census but also censuses of livestock, tigers, houses, agricultural holdings, minor irrigation schemes, economic units, handlooms, below poverty people  and even a census of people and their castes!

Currently, the Ministry of Statistics is conducting a nation-wide economic census that enumerates all economic activities, an exercise as large as the population census, to prepare a ‘business register’. Clearly ‘registers’ are the current fashion for statisticians and administrators. The National Population register (NPR), now in the news, is actually an extension of the population census.

Government has cleared a budget of Rs 8,754 crore for the conduct of the 2021 census. Along with the population count, the Census Office collects data on a variety of other topics relating to population. The data collected is so large that even with the aid of modern technology, many reports are published using only a sample of the data collected. Further, a lot of data are published after a lapse of time. The migration data from the 2011 census came out only recently. Considering the volume of data such delays are understandable.

The census methodology is clear and easy to understand; send out officials to the homes of people and count them and come back with the filled-in forms. Mobile handsets/handheld devices are likely to replace paper schedules for the current census. It is this conceptual simplicity that perhaps explain why census remains an attractive tool to the administrators and is the first step while dealing with any economic or political problem.

Any data collection exercise like a census suffers from two kinds of errors. The first is the coverage error and the second, content error or the error in response or recording. Coverage errors occur when one either misses or double counts the target population. To know the magnitude of these two types of errors the Census Commissioner does a post enumeration check in a sample of blocks. Content errors are more difficult to judge and varies from item to item in the questionnaire. The findings based on sample coverage check of the last two censuses shows that about 23 persons are missed for every 1000 persons enumerated.

For the 2011 census, the post enumeration check showed a net omission of 23 persons for every 1000 persons enumerated, an estimated undercount of 23.1 persons offset by 0.1 persons for every 1000 persons being counted more than once. In comparison, the 2001 check also showed that 23 persons per 1000 enumerated persons were omitted net of duplication. In comparison, the undercount in 1991 was 18 persons per 1000 enumerated persons, similar to that of 1981 census. In 2001 under-count was much larger in urban areas (40 per 1000) and the northern zone, which includes Delhi, have an omission rate of 57 per 1000 for males and 59 per 1000 for females. Greater mobility in cities like Delhi may be the main reason for these high omission rates. In 2001 Delhi had an undercount of over 80 per 1000 for males. Not counting 2.3 per cent of the population in 2011 would indicate that we have no information for 2.8 crores people of the country. Increased mobility would further increase this undercount.

In answer to a question (No 229 on 23rd July 2014) in the Rajya Sabha, the Minister for Home stated that an electronic database of 118 crore persons was prepared in the NPR from 2011 census. The 2011 census had enumerated 121 crore people. Together with the likely undercount in the census, this would suggest that close to six crore people were missed in this NPR exercise. The scale of these data-gathering exercises would scare most statisticians.

As per the website of the Census Office, the objective of the NPR is to create a comprehensive identity database of every ‘usual’ resident in the country. A usual resident is always identified with reference to a geographical location at a point of time. The existing identification through ADHAR has no territorial identity. As now understood, the NPR is prepared at the local (village/sub-town), sub-district, state and national level under the provisions of Citizenship Act 1955 and the Citizenship (Registration of Citizens and issue of National Identity Cards) Rules, 2003.

Mobility of the people is another issue that complicates the preparation of any location-based registers. A decree like that of Augustus Caesar asking people to go back to their place of origin for census/registration being out of question. The percentage of migrants in the population was 37.6 in 2011, of which 31 per cent had migrated or changed their usual place of residence in the village or towns during the intervening years since 2001. Thus one would expect that roughly 12 per cent of the people would not be found in the place where they were enumerated in 2011. How does one update NPR details for them if they cannot be found by the enumerators? Calling the coming census as an exercise to update the 2011 NPR appears to be a misnomer. It has to be a new exercise altogether. Updating is possible only if the current and past records can be unambiguously matched through appropriate ids and corrections incorporated with new members inserted and dead persons deleted.

The house listing schedule of the forthcoming census has 34 items for information for each household besides the location particulars that identifies the household on the ground. The information collected for NPR in 2011 included a lot of text data like names of persons, names of places, addresses etc. In addition to this, it is now proposed to add more details like ADHAR number, voter card, phone number, driving licence number, passport number etc. It is expected that all these will be entered into a handheld device by the enumerator, usually a primary school teacher. The content errors and the unending travails of citizens to correct it can be visualised only in a Kafkaesque scenario. A lot of data in the NPR, like educational status, marital status, occupation etc are also not constant for a person and given the time lag in the finalisation of registers would have very little validity when the list is finalised.

The experience from the Economic Censuses, conducted by the Ministry of Statistics, to prepare a ‘Business Register’ much like the NPR with establishments replacing people is well known. The imperfectness of these registers were established when NSSO used one of these for its 74th round survey and found most of the names in the register untraceable on the ground.

A major concern for researchers using population 2011 census data, especially internal migration data, would be the effect of NPR and the rising controversy of it being a base document for NRC on the reliability of census data. Recent migrants always have a problem in producing identity or address proof at their current place of residence. Will they then report that they are migrants? Alternatively, people may report a longer residency at the place of enumeration to dilute their ‘migrant’ character.

World over surveyors are concerned about respondent burden and bias in reporting. The latter is quoted by the Government as the reason for the lower toilet coverage reported from one of their own recent surveys. For census professionals it is a nightmare to organize a census amidst the fears and threats of likely disenfranchisement based on it and maintain the credibility of data. The dramatic increase in the items on which information is to be collected from the household would test the patience of both the enumerators and the respondents.

  • P C Mohanan is former acting chairman of National Statistics Commission.

Source: Financial Express