Disparities Among Districts in Central Java Province: Cluster Analysis Based on Several Well-Being Indicators

This paper aims to group districts in Central Java Provinces based on several well-being indicators published by The National Statistics Agency of Indonesia (BPS) in 2019. The Ward method used hierarchical cluster analysis to group districts and identified disparities among clusters. The results show that districts in Central Java can be divided into 3 clusters: cluster 1 consists of 4 sections with a high level of well-being; cluster 2 consists of 16 districts with a moderate level of well-being; and cluster 3 consists of 15 districts with the low level of well-being. The average variable score for each cluster indicates disparities among groups. The variable score for cluster 1 with the high level of well-being is far above the score for clusters 2 and 3 in economics, education, sanitation, and public health. Only four districts belong to the cluster with a high level of well-being, all of which have administrative status as a city. In contrast, communities with a low level of well-being all have a managerial position as regencies. The results also found that districts in the western part of Central Java tend to have a lower level of well-being than the eastern part of Central Java. Thus, Central Java Province needs to pay more attention to districts in cluster 3 with a low level of well-being, especially in the western part of Central Java in terms of economy, education, sanitation, and public health.


INTRODUCTION
Regional disparities are a problem every country faces, including Indonesia, at the national, provincial, district level, and even sub-district levels. Nationally, regional disparities exist in provinces in the western part of Indonesia that tend to be more developed and prosperous than those in the eastern part of Indonesia. The regional disparities also occur among districts within one province. There have been many studies on regional disparities in Indonesia, which are generally measured in the term of economics using The Williamson Index and Klaassen typology (Anggara, 2019;Evanza, 2018;Fitriyah & Prabowo, 2021;Karim et al., 2019;Noviar, 2021;Pamiati, 2021;Sari, 2018;Sukwika, 2018;Wijayanti & Arsyad, 2019). However, this measurement is only based on the variables of Gross Regional Domestic Product (GRDP), economic growth, and population. Multidimensional and more comprehensive inter-regional measurements can be carried out using cluster analysis (Pusdiktasari et al., 2021;Raheem et al., 2019;Romyen, 2021).
Based on Williamson Index, Central Java is one of the provinces with high regional economic disparities among districts (Anggara, 2019;Fahrizal et al., 2019;Nuarta, 2018). Another study by Wahyuningsih et al. (2019) using the Gini index shows that regional disparities among districts in Central Java Province fluctuate and tend to get higher. Thus, this research will identify differences among district groups in Central Java by grouping districts using hierarchical multivariate cluster analysis using several quantitative well-being indicators. These multidimensional well-being indicators provide a more comprehensive measure of disparities than unidimensional indicators (Aaberge & Brandolini, 2015;Bourguignon & Chakravarty, 2019;Döpke et al., 2017;Efmona et al., 2021;Kose & Demirtasli, 2012;Lee, 2018).
According to Purwana (2014), well-being is a person's condition where his life is free from poverty, ignorance, fear, or worry. His life is safe and peaceful, both physically and mentally. According to Suryono (2018), there are four aspects of well-being, namely (1) a condition in which a person is prosperous, in good health, and in peace; (2) closely related to profits or benefits (economic aspects); (3) the fulfillment of services to the needs of the community (social aspect); (4) income received to meet basic needs adequately and humanely (an aspect of financial policy). Well-being includes several intangible aspects that are subjective and difficult to measure. Therefore, the measure of regional well-being is often approached more objectively through several macro quantitative indicators, such as income, per capita consumption or expenditure, the Human Development Index (HDI), as well as other quantitative indicators that include not only the economic dimension but also social dimensions, such as health and education, as well as environment dimension (Barrington-Leigh & Escande, 2018;Facchinetti & Siletti, 2021;Jordá & Sarabia, 2015). This study uses macro indicators of regional well-being measured by the National Statistical Agency of Indonesia (BPS), which includes economic aspects (approached by per capita expenditure), education, health, housing conditions, as well as capability to suing technology and information (BPS, 2019a).
Equitable regional well-being is one of the goals of national and regional development. For this reason, a study on comparing community welfare between regions is needed to remember the existence of inter-regions and input for determining policies in creating equitable development between regions. Several studies using cluster analysis to group districts in Central Java have been carried out based on indicators of poverty and well-being (Hidayat et al., 2017;Putriana et al., 2016;Widiastuti & AG, 2012;Widyadhana et al., 2021;Yulianto & Hidayatullah, 2016). However, these studies discuss the comparison of the best clustering methods. Based on the cluster analysis results, they do not highlight the regional disparities among districts in Central Java. Therefore, this research is considered essential to complement the existing research.
This research aims to identify regional disparities among districts in Central Java using hierarchical cluster analysis on several quantitative wellbeing indicators. The research results are expected to contribute to the research on regional differences using multivariate analysis. It can also be used as a basis for local governments to formulate policies and development planning to overcome regional disparities in well-being among districts in Central Java Province.

RESEARCH METHODS
This study uses cluster analysis to classify 35 districts in Central Java based on several quantitative macro indicators of regional well-being. The data used is secondary data obtained from several annual publications from BPS, namely: (1) Well-being Statistics of Central Java Province 2019 (BPS, 2019a); (2) Social and Population Statistics of Central Java Province (BPS, 2019b); and (3) Central Java Province in Figures 2020 (BPS, 2020). Data processing is carried out using IBM SPSS Statistics v.23 software.
Cluster analysis is a multivariate statistical method that groups objects into several clusters (groups) based on similar characteristics. Cluster analysis can identify regional disparities, as done in previous studies (Goletsis & Chletsos, 2011;Munandar et al., 2017;Pusdiktasari et al., 2021;Raheem et al., 2019;Romyen, 2021). The clustering process is carried out to obtain a minimum variation or high homogeneity within the cluster and a maximum variation or high heterogeneity between groups.
The stages of cluster analysis carried out in this study are as follows. The first stage is the selection and testing of variables. This stage includes the process of exploring and cleaning data, as well as testing the assumptions of sample representation and multicollinearity. Data were analyzed in the early stages using several indicators, such as range, standard deviation, variance, skewness, kurtosis, stem-and-leaf diagrams, and boxplots. This data exploration is also used to analyze data distribution and identify outliers. Based on the initial identification, there are significant differences in the unit of measurement between variables, where some variables are measured in percentage units while others are in thousands of units. Thus, variables are standardized using the Z-score: subtracting the observation value by the mean and dividing it by the standard deviation for each variable. This process converts the raw data scores into average values with a mean of 0 and a standard deviation of 1. In turn, it eliminates bias caused by differences in the measurement units of variables used in the analysis (Hair et al., 2014;Romyen, 2021). Next, two assumptions that must be met in the cluster analysis are tested: (1) representativeness of the samples; (2) non-multicollinearity, i.e., there is no correlation between variables (Hair et al., 2014). Both assumptions can be met by conducting a factor analysis before performing cluster analysis. The sample adequacy test was carried out based on the Kaiser-Meyer-Olkin (KMO) statistics, the Bartlett test of sphericity, and the Measure Of Sampling Adequacy (MSA) value. In contrast, the multicollinearity assumption was tested based on the correlation matrix between variables (Afira & Wijayanto, 2021). If some of the research variables are highly correlated, then those variables must be reduced into several factors before being analyzed with multivariate cluster analysis.
The second stage is determining the clustering algorithm/method. There are two clustering algorithms, namely hierarchical and non-hierarchical clustering processes. Hierarchically the number of clusters formed is carried out through a gradual clustering process, while in a non-hierarchical method, the number of sets is determined before the clustering process (Pusdiktasari et al.,

Disparities Among Districts in Central Java Province: Cluster Analysis Based on Several Well-Being Indicators
Pradaningtyas, Margawati, Putro 2021). Objects are grouped hierarchically or with a tree-like structure, either agglomerative or divisive in the hierarchical method (Nadif & Govaert, 2010).
In the agglomerative manner of the hierarchical cluster, the clustering process begins with two or more objects with the closest similarity/distance to form the first cluster. The process continues where the nearest groups are joined to form one extensive collection in the form of a tree containing all existing objects, from the most similar to dissimilar. In contrast to the divisive method, all existing objects are considered one large cluster; then, they are divided into several clusters based on their similar characteristics. Several methods of measuring similarity/ dissimilarity between clusters in the hierarchical approach include single linkage, complete linkage, average linkage, centroid, and Ward's Error Sum of Square. Cluster analysis in this study uses hierarchical grouping with the Ward method. The hierarchical clustering method provides a comprehensive evaluation of various cluster solutions. Ward's method with squared Euclidian distance was used in this study because of its efficiency compared to other methods (Hair et al., 2014;Handayani, 2013;Romyen, 2021).
The last stage in cluster analysis is interpretation. The interpretation stage includes giving a specific name that best describes the characteristics of the cluster (Kusumawardani, 2018). This stage is carried out based on the variables' average value, becomes the basis for the clustering process, and explains the differences between clusters based on the relevant dimensions.

RESULT AND DISCUSSION
The research variables were standardized using the Z-score before conducting cluster analysis to avoid any measurement bias due to differences in units of measurement. Thus, the selected variable must be normally distributed. Furthermore, based on the results of descriptive statistics using several indicators, such as range, standard deviation, variance, skewness, and kurtosis, 13 out of 25 variables at the initial stage were selected to be analyzed further using multivariate cluster analysis. The chosen research variables can be seen in Table 1. Percentage of households with access to proper sanitation sanitation 10 Morbidity rate morbidity 11 Percentage of households using cell phones/HP HP 12 Percentage of households using the internet internet 13 Percentage of households using computers computer Furthermore, because there is multicollinearity between variables, the highly correlated variables are reduced by using factor analysis. The initial stage in factor analysis is to test the validity of the data through the Kaiser-Meyer-Olkin (KMO) statistics and Bartlett test. KMO Measure of Sampling Adequacy (MSA) is used to measure the correlation between variables and whether or not factor analysis can be performed. At the same time, Bartlett's test of sphericity is a statistical test to determine whether there is a correlation between variables. The factor analysis can continue if the KMO MSA value is more significant than 0.50. Based on Table 2, the KMO MSA value is 0.805 with a Bartlett test significance value of 0.000. Because the KMO MSA value is above 0.5 and the significance value of the Bartlett test is below 0.05, the factor analysis can be performed. The MSA value based on the anti-image correlation matrix for each variable, as shown in Table 3, also indicates values > 0.5 so that the research variables can be analyzed further.  0.000

Pradaningtyas, Margawati, Putro
Variable MSA Zscore (computer) 0.832 The reduction of the research variables is performed using the principal component method. Based on the eigenvalues, loading factors, and scree plots, 13 research variables were reduced to 4. The scree plot in Figure 1 shows that the slope of the graph is getting much smaller after the fourth factor, which means that the optimal number of elements in explaining the research variable is 4. The percentage of total variance the four factors can explain is 85.18%.

Figure 1. Scree Plot
Factor loadings in the component matrix show the correlation between variables and the factors, thus becoming the basis for classifying the 13 variables into four elements. The following variable reduction is obtained based on the value of factor loadings in Table 4. The variables of expenditure per capita, average years of schooling, expected years of education, percentage of households using computers, percentage of households using the internet, the share of households using cell phones were included in factor 1, which is labeled as expenditure and education (including the capability of using technology and information). The variable percentage of households having access to proper sanitation and the portion using latrines is included in factor 2, labeled as sanitation. The variable rate of households in which the primary material of the dwelling floor is marble/granite/ceramic, the percentage of households in which the main material of the outer wall is masonry, and the rate of households with access to safe drinking water is included in factor 3 labeled as housing conditions. The life expectancy and morbidity variables are included in factor 4, which is marked as health. The factor scores of the research variables are used in cluster analysis for clustering/ grouping districts in Central Java. Based on the hierarchical multivariate cluster analysis using Ward Linkage, 35 communities in Central Java are clustered into 3 clusters. Collection one consists of 4 districts; group 2 consists of 16 sections; group 3 consists of 15 districts (Table 5). The average value of factor scores is to determine the characteristics of each cluster. Based on the average factor scores in Table 6, it is known that the scores for all factors, namely expenditure and education, sanitation, housing conditions, and health, for cluster 1 are above average or the highest among the three clusters. Cluster 2 had a sanitation factor score above the average, but the expenditure and education scores, housing conditions, and health were below the average. Meanwhile, cluster 3 had a slightly above the average housing condition factor, but the expenditure and education, sanitation, and health factor scores were below the average. In addition, the expenditure and education factors and sanitation scores in cluster 3 are the lowest among the three clusters, with quite a significant difference in scores. Thus, cluster 1 can be labeled as a group of districts with a high level of well-being; cluster 2 is a group of communities with a moderate level of well-being, and cluster 3 is a group of neighborhoods with a low level of well-being. The clustering formation process is described in more detail through the dendrogram in Figure 2. The dendrogram shows a scale that represents the level of similarity. The smaller the scale, the more similar the observation is.

Pradaningtyas, Margawati, Putro
The average value of each variable making up the four factors for each cluster is calculated to obtain a more detailed description of the characteristics of each cluster. Based on the average score of each variable, as shown in Table  7, it is increasingly visible that cluster 1 has the highest average score for all variables. Cluster 2, a group of districts with a moderate level of well-being, has the second-highest score for 11 out of the 13 variables. Meanwhile, cluster 3 has the lowest score for 11 out of the 13 variables, so it is categorized as a group of districts with a low level of well-being. The exceptions for the scores of two variables, namely the wall and the floor of cluster 3, are slightly higher than in cluster 2. The average score of the thirteen variables in each factor emphasizes significant disparities between clusters. These disparities mainly occur between districts in cluster 1 and districts in clusters 2 and 3, in which the average score of the thirteen variables in cluster 1 are much higher than those in cluster 2 and cluster 3, both in terms of economy (expenditures), education, health, and sanitation. Disparities in the level of well-being among the group of districts can be observed more clearly through the map of districts grouping based on cluster analysis using quantitative macro well-being indicators in Figure 3. The map illustrates that only a few areas are included in Cluster 1 with a high level of well-being, namely Magelang City, Surakarta City, Salatiga City, and Semarang City. The four regions have administrative status as a city with a relatively small area compared to other regencies.
Meanwhile, two regions with administrative status as cities, namely Pekalongan City and Tegal City, are included in Cluster 2 with a moderate level of well-being. Thus, it can be concluded that districts with administrative status as a city in Central Java Province tend to have a higher level of well-being than districts with a managerial position as a regency. This is closely related to the faster development progress in urban areas (cities) compared to those in

Pradaningtyas, Margawati, Putro
regencies, which has implications for well-being in terms of expenditure and education, sanitation, housing conditions, and health. In addition, the map also shows that districts in the western part of Central Java have a lower level of well-being than districts in the eastern part of Central Java.  Yulianto & Hidayatullah (2016), and Widyadhana et al. (2021), which show that six cities in Central Java Province tend to have fewer poor people or in other words have a higher level of well-being than the other regencies. In addition, Brebes, Tegal, Pemalang, Pekalongan, and Batang, located on the northern coast of Central Java, have a low level of well-being compared to other districts. This is also in line with the research of Widyadhana et al. (2021) and Putriana et al. (2016), which shows that the sections on the north coast of Central Java have higher poverty rates.
Compared to those in the research of Handayani (2013), districts with low welfare levels in the western part of Central Java Province are classified in the cluster with a "highly rural" status, which is characterized by limited infrastructure and access to urban facilities, relatively poor performance in the non-primary sector (still dominated by the agricultural industry), and low levels of education. On the other hand, most of the districts included in the cluster 1 and 2 with moderate and high levels of well-being have both "rural-urban" and "urban" status, which is characterized by the availability of infrastructure and access to urban facilities, as well as the development of the non-agricultural sector, including significant "foot-loose" industries, local resource-based industries, and the service sector.

CONCLUSION
The result of the study indicates that there are regional disparities in terms of the level of well-being among districts in Central Java Province, where the average score of the research variables for cluster 1 with a high level of wellbeing is far above the variable scores for clusters 2 and 3 with a moderate and low level of well-being respectively. Only four districts belong to cluster 1 with a high level of well-being, all of which have administrative status as a city. In contrast, communities with a low level of well-being all have an executive position as regencies. In addition, the results also show that areas in the western part of Central Java tend to have a lower level of well-being than the eastern part of Central Java. The study results align with previous studies where areas with administrative status as a city tend to have a higher level of well-being than areas with an executive position as regencies, especially districts with highly rural characteristics. For this reason, Central Java Province needs to pay more attention, especially to communities in cluster 3 with a low level of well-being, especially in terms of economy, education, sanitation, and public health.
This study is a cross-sectional study using several quantitative macro indicators of regional well-being in Central Java Province in 2019. Further research can complement this study's limitations by comparing regional wellbeing indicators in subsequent years and analyzing changes that occurred before and after the pandemic covid-19. In addition, the results of this study indicate a relationship between urbanization and regional well-being, which can be further explored and analyzed deeper in subsequent studies.