Table of Contents
In the bigger picture of HIS terminology all data in DHIS are usually called aggregated as they are aggregates (e.g. monthly summaries) of medical records or some kind of service registers reported from the health facilities. Aggregation inside DHIS however, which is the topic here, is concerned with how the raw data captured in DHIS (through data entry or import)are further aggregated over time (e.g. from monthly to quarterly values) or up the organisational hierarchy (e.g. from facility to district values).
Raw data refers to data that is registered into the DHIS 2 either through data entry or data import, and has not been manipulated by the DHIS aggregation process. All these data are stored in the table (or Java object if you prefer) called DataValue.
Aggregated data refers to data that has been aggregated by the DHIS 2, meaning it is no longer raw data, but some kind of aggregate of the raw data.
Indicator values can also be understood as aggregated data, but these are special in the way that they are calculated based on user defined formulas (factor * numerator/denominator). Indicator values are therefore processed data and not raw data, and are located in the aggregatedindicatorvalue table/object. Indicators are calculated at any level of the organisational hierarchy and these calculations are then based on the aggregated data values available at each level. A level attribute in the aggregateddatavalue table refers to the organisational level of the orgunit the value has been calculated for.
Period and Period type are used to specify the time dimension of the raw or aggregated values, and data can be aggregated from one period type to another, e.g. from monthly to quarterly, or daily to monthly. Each data value has one period and that period has one period type. E.g. data values for the periods Jan, Feb, and Mar 2009, all of the monthly period type can be aggregated together to an aggregated data value with the period Q1 2009 and period type Quarterly.
Data (raw) can be registered at any organisational level, e.g. at national hospital at level 2, a health facility at level 5, or at a bigger PHC at level 4. This varies form country to country, but DHIS is flexible in allowing data entry or data import to take place at any level. This means that orgunits that themselves have children can register data, sometimes the same data elements as their children units. The basic rule of aggregation in DHIS 2 is that all raw data is aggregated together, meaning data registered at a facility on level 5 is added to the data registered for a PHC at level 4.
It is up to the user/system administrator/designer to make sure that no duplication of data entry is taking place and that e.g. data entered at level 4 are not about the same services/visits that are reported by orgunit children at level 5. NOTE that in some cases you want to have duplication of data in the system, but in a controlled manner. E.g. when you have two different sources of data for population estimates, both level 5 catchment population data and another population data source for level 4 based on census data (because sum of level 5 catchments is not always the same as level 4 census data). Then you can specify using advanced aggregation settings (see further down) that the system should e.g. not add level 5 population data to the level 4 population data, and that level 3,2,1 population data aggregates are only based on level 4 data and does not include level 5 data.
How data is aggregated depends on the dimension of aggregation (see further down).
Along the orgunit level dimension data is always summed up, simply added together. Note that raw data is never percentages, and therefore can be summed together. Indicator values that can be percentages are treated differently (re-calculated at each level, never summed up).
Along the time dimension there are several possibilities, the two most common ways to aggregate are sum and average. The user can specify for each data element which method to use by setting the aggregation operator (see further down). Monthly service data are normally summed together over time, e.g. the number of vaccines given in a year is the sum of the vaccines given for each month of that year. For population, equipment, staff and other kind of what is often called semi-permanent data the average method is often the one to use, as, e.g. 'number of nurses' working at a facility in a year would not be the sum of the two numbers reported in the six-monthly staffing report, but rather the average of the two numbers. More details further down under 'aggregation operators'.
Organisational units are used to represent the 'where' dimension associated with data values. In DHIS 2, organisational units are arranged in a hierarchy, which typically corresponds to the hierarchical nature of the organisation or country. Organisational unit levels correspond to the distinct levels within the hierarchy. For instance, a country may be organized into provinces, then districts, then facilities, and then sub-centers. This organisational hierarchy would have five levels. Within each level, a number of organisational units would exist. During the aggregation process, data is aggregated from the lower organisational unit levels to higher levels. Depending on the aggregation operator, data may be 'summed' or 'averaged' within a given organisational unit level, to derive the aggregate total for all the organisational units that are contained within a higher level organisational unit level. For instance, if there are ten districts contained in a province and the aggregation operator for a given data element has been defined as 'SUM', the aggregate total for the province would be calculated as the sum of the values of the individual ten districts contained in that province.
Periods are used to represent the 'when' dimension associated with data values. Data can easily be aggregated from weeks to months, from months to quarters, and from quarters to years. DHIS 2 uses known rules of how these different intervals are contained within other intervals (for instance Quarter 1 2010 is known to contain January 2010, February 2010 an March 2010) in order to aggregate data from smaller time intervals, e.g. weeks, into longer time intervals, e.g. months.
The data element dimension specifies 'what' is being recorded by a particular data value. Data element categories are actually degenerated dimensions of the data element dimension, and are used to disaggregate the data element dimension into finer categories. Data element categories, such as 'Age' and 'Gender', are used to record a particular data element, typically for different population groups. These categories can then be used to calculate the overall total for the category and the total of all categories.
The 'sum' operator simply calculates the sum of all data values that are contained within a particular aggregation matrix. For instance, if data is recorded on a monthly basis at the district level and is aggregated to provincial quarterly totals, all data contained in all districts for a given province and all weeks for the given quarter will be added together to obtain the aggregate total.
When the average aggregation operator is selected, the unweighted average of all data values within a given aggregation matrix are calculated.
It is important to understand how DHIS 2 treats null values in the context of the average operator. It is fairly common for some organisational units not to submit data for certain data elements. In the context of the average operator, the average results from the number of data elements that are actually present (therefore NOT NULL) within a given aggregation matrix. If there are 12 districts within a given province, but only 10 of these have submitted data, the average aggregate will result from these ten values that are actually present in the database, and will not take into account the missing values.
The normal rule of the system is to aggregate all raw data together when moving up the organisational hierarchy, and the system assumes that data entry is not being duplicated by entering the same services provided to the same clients at both facility level and also entering an 'aggregated' (sum of all facilities) number at a higher level. This is to more easily facilitate aggregation when the same services are provided but to different clients/catchment populations at facilities on level 5 and a PHC (the parent of the same facilities) at level 4. In this way a facility at level 5 and a PHC at level 4 can share the same data elements and simply add together their numbers to provide the total of services provided in the geographical area.
Sometimes such an aggregation is not desired, simply because it would mean duplicating data about the same population. This is the case when you have two different sources of data for two different orgunit levels. E.g. catchment population for facilities can come from a different source than district populations and therefore the sum of the facility catchment populations do not match the district population provided by e.g. census data. If this is the case we would actually want duplicated data in the system so that each level can have as accurate numbers as possible, but then we do NOT want to aggregate these data sources together.
In the Data Element section you can edit data elements and for each of them specify how aggregation is done for each level. In the case described above we need to tell the system NOT to include facility data on population in any of the aggregations above that level, as the level above, in this case the districts have registered their population directly as raw data. The district population data should then be used at all levels above and including the district level, while facility level should use its own data.
This is controlled through something called aggregation levels and at the end of the edit data element screen there is a tick-box called Aggregation Levels. If you tick that one you will see a list of aggregation levels, available and selected. Default is to have no aggregation levels defined, then all raw data in the hierarchy will be added together. To specify the rule described above, and given a hierarchy of Country, Province, District, Facility: select Facility and District as your aggregation levels. Basically you select where you have data. Selecting Facility means that Facilities will use data from facilities (given since this is the lowest level). Selecting District means that the District level raw data will be used when aggregating data for District level (hence no aggregation will take place at that level), and the facility data will not be part of the aggregated District values. When aggregating data at Province level the District level raw data will be used since this is the highest available aggregation level selected. Also for Country level aggregates the District raw data will be used. Just to repeat, if we had not specified that District level was an aggregation level, then the facility data and district data would have been added together and caused duplicate (double) population data for districts and all levels above.
The purpose of the data mart is to provide pre-processed data to external data analysis and reporting tools. The data mart consists of two tables, aggregateddatavalues and aggregatedindicatorvalues in the DHIS 2 database. The values in the data mart are aggregated versions of the raw data found in the datavalue table as well as calculated indicator values. Aggregation can take place over time (e.g. from monthly data to aggregated quarterly values), or along the organisation unit hierarchy levels (e.g. from PHU data to aggregated district totals). The data mart can store all kinds of such aggregated values. The data mart is as such just a processed 'copy' of the data values and it can be emptied and regenerated at any time, and the tables are read only tables. The metadata in the two data mart tables are referenced by internal identifiers, such as dataelementid, organisationunitid which refer to the tables like dataelement and organisationunit, see 'How to make use of the data mart in external tools' for more on this.
The DHIS2 Data Mart handles the aggregation of data across multiple dimensions (organisation unit hierarchy, orgunit group hierarchy, time, etc) and can be controlled through this interface. Select the organisation unit levels which should be aggregated, the start date and end date, and press "Start export". The data mart process will be executed in the background, and a full report of the export process will be updated regularly so that you can determine the state of the process. See the section on "Scheduling" in the data administration module for information on how this process can be triggered to run automatically.
Each data value for a data element has a reference to a category option combo, which is a combination of the disaggregations for the data value, e.g. (male,<5y) or (In PHU, <1y). These disaggregations are exported as they are to the data mart, and no aggregation is done on this dimension. See the data elements section for more on data element categories and the resource tables section for more information on how to do aggregation on these categories.
When you add new data to an existing data mart the new values will be appended to the existing so that the data mart grows for each new process if new selections (such as new periods) have been made. If any of the selected values are already in the data mart, then the old will be replaced by the newly generated values.
Resource tables provide additional information about the dimensions of the data in a format that is well suited for external tools to combine with the data mart tables. By joining the data mart with these resource tables one can easily aggregate along the data element category dimension or data element/indicator/organisation unit groups dimensions. E.g. by tagging all the data values with the category option male or female and provide this in a separate column 'gender' one can get subtotals of male and female based on data values that are collected for category option combinations like (male, <5) and (male,>5). See the Pivot Tables section for more examples of how these can be used. orgunitstructure is another important table in the database that helps to provide the hierarchy of orgunits together with the data. By joining the orgunitstructure table with the data mart tables you can get rows of data values with the full hierarchy, e.g. on the form: OU1, OU2, OU3, OU4, DataElement, Period, Value (Sierra Leone, Bo, Badija, Ngelehun CHC, BCG <1, Jan-10, 32) This format makes it much easier for e.g. pivot tables or other OLAP tools to aggregate data up the hierarchy.
Report tables are defined, cross-tabulated reports which can be used as the basis of further reports, such as Excel Pivot Tables or simply downloaded as an Excel sheet. Report tables are intended to provide a specific view of data which is required, such as "Monthly National ANC Indicators". This report table might provide all ANC indicators for a country, aggregated by month for the entire country. This data could of course be retrieved from the main datamart, but report tables generally perform faster and present well defined views of data to users.
It is therefore important to keep in mind that when the aggregation strategy of the system is set to "Batch", the data for each report table must also be present in the data mart.