An Ensemble of Deep Auto-encoders for Healthcare Monitoring

Due to increase in chronic disease, aging, and medical costs, we are approach- ing a world where basic healthcare would become out of reach to most peo- ple. Fortunately modern technologies like Ambient Intelligence (AmI) and Internet of things (IoT) offer unparalleled benefits which could improve the quality and efficiency of treatments and accordingly improve the health of the patients. The idea is to bring down the hospital management, drugs and healthcare costs and also make it more accessible. Real-time monitoring using sensors can save lives in case of a medical emergency like heart failure, diabetes, etc. and for terminally ill patients, who need continuous mon- itoring. IoT enables interoperability, machine-to-machine communication, information exchange, and data movement that makes healthcare service delivery effective. Mobile healthcare helps monitor/check the patients and identify anomalies instantly besides reducing hospital management costs.
In this chapter, we present AmI’s infrastructure, technology and method- ology used for mobile health monitoring and how can greatly effect our lives.
1.1 What is Ambient Intelligence?
The European Commission Information Society Technologies Advisory Group (ISTAG) first introduced the concept of Ambient Intelligence [3]. Thus, AmI is not new but it becomes more real thanks to the computer’s evolution, seen the miniaturization of microprocessors and nanotechnology, as well as the cost-effectiveness of storage capabilities and communication bandwidths. AmI introduces a new way of interaction between people and technology [4], where a human behaves spontaneously while machines and artificial intelli- gence act in background to improve his comfort and to reduce the risks he may face. An AmI system is particularly identified by several characteristics [5]:
• Context aware: It takes into account the information’s environment.
• Personalized: It is tailored to the requirements of each user.
• Anticipatory: It can predict the user’s need without conscious media- tion of the user .
• Adaptive: It adapts alone to any change. 8
• Ubiquity: is being used and integrated in our daily practice.
• Transparency: It is used into the quotidian life of the user quietly in
an unobtrusive way.
AmI works in a seamless, unobtrusive and an invisible way. It is an emerging domain based on ubiquitous computing and makes a vast modi- fication of protocol’s design, communications devices, etc. In addition, it is made possible by the flourish of the IoT [6]. IoT is used to provide the AmI systems with data about the user’s quotidian activities without human intervention. This has been enabled through the use of small and less
ex- pensive sensors. The contribution that sensors offer is gathering data about the user’s environment, health status and activities as shown in Table 1.1 and Table 1.2.
Table 1.1: Sensors detect vital signs.
Name
Purpose
Accelerometer Gyroscopes GPS
ECG
EEG
EOG
EMG
PPG
Pulse oximeter Blood pressure SKT
Measure acceleration, fall detection, location and posture Measure orientation, motion detection
Motion detection and location tracking
Monitor cardiac activity
Measure of brain waves
Monitor eye movement
Monitor muscle activity
Heart rate and blood velocity Measure blood oxygen saturation Measure blood pressure
Skin temperature
Table 1.2: Ambient sensors used in smart environments.
Name
Purpose
Light
PIR Temperature Pressure Switch sensor RFID Ultrasonic Power Humidity
Measure intensity of light
Identify user location
Measure room temperature and body temperature Identify inhabitant location
Open/close door detection
Object and people identification
Location tracking
Calculate power usage
Measure room humidity
9
With these sensors, AmI’s system can offer a large number of applications like smart home [7], smart cities [8] and so on. In smart home the sensors are placed in different stuff to gather information about the home conditions [9]. The Gator Tech Smart House was developed to monitor older people and individuals with chronic disease [10]. This home is equipped with a high num- ber of sensors which collect massive volumes of data. Otherwise, Santander city includes more than 12,500 sensors. It focuses to implement smart trans- portation through congestion management, outdoor parking management, driver guidance. The sensors are not enough to have an accurate prediction about the situation of the patient and to gather data continuously.
1.2 Recent Trends
There are new technologies used to upgrade the quality of AmI in health care services.
1.2.1 Internet of Things (IoT)
According to [11], Internet of Things semantically means \”a world-wide net- work of interconnected objects uniquely addressable, based on standard com- munication protocols\”. IoT is a novel network concept, interrelated com- puting device, machine, object and people which have a unique identifier associated with them. It can sense and transmit data without human inter- vention. Thus, the \”Internet of Thing\” has the ability to make everything communicate, interact and identifiable anywhere and anytime, to get life less difficult and more comfortable. It is one of the most existing trend and innovation in the recent history of technological advancement. With cloud computing, connectivity and big data there has been an explosion of IoT based application solution in diversified field like healthcare systems.
In healthcare domain IoT has made huge inroads, for example, in [12] au- thors Implemented IoT functionalities for unobtrusive and continuous Heart Rate (HR) monitor such that HR data are recorded from Pulse-Glasses, vi- sualized on android smartphone, and stored seamlessly on the cloud. Also, in [13] authors used a smart Glove with Fog-Driven IoT to assist people with Parkinson’s disease in tracking the effectiveness of medications that re- lieve motor symptoms and to transmit motion data into a personal, patient
10
oriented app.
1.2.2 Cloud Computing
According to [14], cloud computing can be defined as \”A computing paradigm which is a pool of abstracted, virtualized, dynamically scalable, managing, computing, power storage platforms and services for on demand delivery over the Internet\”. Using cloud computing for healthcare present promising op- portunities to healthcare service delivery. It provides a strong infrastructure to support health services with high quality and lowest cost \”pay-as-you- use\”.
There are several benefits of using cloud computing for healthcare [15], such as (a) better patient care: The medical record of the patient is available for healthcare providers anytime and anywhere, (b) reduced cost: It reduces the operational and maintenance costs means pay for actual resource uti- lization and without hardware costs, (c) information sharing: Healthcare providers can share part of their data with others like health research insti- tutes, hospitals.
Many researchers have worked on cloud computing for healthcare system. For example, in [16] authors propose a solution to automate the process of gathering patient data through the use of sensors connected to medical devices and convey this information to the medical centers \”cloud\” for saving, processing and distribution. It facilitates the deployment process, means without cabling, and provides real-time data collecting all the time. Also, in [17] authors developed a cloud-based system for clients with a mobile device or web browser for hosting ECG analysis service for real-time ECG monitoring and analysis.
1.2.3 Big Data analysis in healthcare
These days, large data volume is generated from heterogeneous sources such as social networks, internet and healthcare systems. This is owing the use of several technology trends, like the spread of smart device, Internet of Things and the prevalence of cloud computing. Generally, healthcare systems gen- erate tremendous data that are difficult to store, process and interpret [18]. Because this data can have different form (structured, semi-structured and
11
unstructured) [19], it is impossible to manage it with traditional database management.
Consequently, there is a huge need to use big data in health care systems. Big data has the potential to improve the quality of care by the use of past and current healthcare data to make quality healthcare decision [20]. The four main characteristics that define big data are:
1. Volume: Massive data gathered through enormous sensor devices [18].
2. Velocity: Refers the speed of processing the data in real time such as telemedicine, remote health monitoring and virtual human on mobile platforms [21].
3. Variety: The data can take heterogeneous form (structured, semi- structured and unstructured) such as photo, video, sensor data and so on.
4. Veracity: Handle the quality of captured data.
1.2.4 Machine learning
There are many machine learning techniques used to analyze sensor data for healthcare systems. These techniques improve the work of medical ex- pert, assisted healthcare monitoring and make the system more accurate. In [22] authors have presented a comparison of different classifiers and Meta classifiers. Then combining algorithms to develop novel intelligent ensemble healthcare and decision support. This ensemble was constructed on Meta classifiers voting with three base classifiers Random Tree, J48 and Random Forest algorithms.
Also, machine learning algorithms identify the anomalies and enable con- tinuous diagnosis. For example, in [23] authors use bagging algorithm to determine the cautionary signs of heart disease of the patient and compare this outcome with the decision tree algorithm. In [24] authors, presented a predictive models for breast cancer survivability, using C5 algorithm with bagging. As well, in [25] authors presented an application of machine learn- ing techniques in the medical diagnosis of stroke. They use C5 algorithm
12
(new version of C4.5 [26]) which is capable of \”learning from example\” by constructing a decision tree that can be transformed to IF/THEN rules.
1.3 Algorithms and Methodology
Ambient intelligence is the combination of environment and body sensors with algorithms and methodologies. In this section we present these algo- rithms.
1.3.1 Activity Recognition
The contribution that sensors offer is the ability to collect daily activities which occurred in an intelligence environments such as the user goes to bed at 09:00 pm\”. Therefore, it performs a decisive role in AmI environment. Activity recognition is used in several applications such as health care [27], ambient assisted living [28], surveillance and security [29] and smart home [30]. The Neural Network House, the House_n and mavHome projects mon- itor the health situation of the user by recognise the activities of the user [31] at home.
The activities are divided into two categories, the first is recognized by sensors which are placed on the body, such as walking, standing up and laying. The second category is recognized by ambient sensors to look how the user moves things [32]. Human’s behavior in quotidian activities is com- plex and highly diverse. Hence, controlling these activities is a big challenge.
These challenges are: (a) recognizing concurrent activities: Doing differ- ent activities at the same time, (b) recognizing interleaved activities: Ac- tivities that are overlap with others, (c) ambiguity of interpretation: Same activities can be explicated differently, (d) support multiple residents: Rec- ognizing the activities perform in parallel by the resident in a group [33].
According to [34], activity recognition involves different steps from gath- ering data using the body and ambient sensors to know the current activity of the user. These steps are preprocessing, feature extraction, classification and interpretation. For the classification we have two categories; supervised and unsupervised learning. For supervised learning, we can cite Support
13
Vector Machine (SVM), K-Nearest Neighbor (KNN) and Artificial Neural Network (ANN) algorithm. For unsupervised classification techniques we can cite K-means algorithm and Gaussian Mixture Model (GMM) approach [35]. We use an unsupervised approach when classes to predict are not la- belled.
Smartphones are becoming the main platform for human activity recog- nition [36]. Since the mobile phone is used on daily basis and equipped with several sensors (Accelerometer, GPS, Barometer, Thermometer, Pedometer, Troximity sensor, Light sensor, etc) [37], it is a suitable tool for continu- ous monitoring the user’s activities. In [38] the authors present a mobile application for identifying physical activities and calculating the amount of daily calorie expenditure using triaxial accelerometer. Also, in [39] authors develop ActiServ system to support activity recognition on mobile phones for common users.
1.3.2 Behavioral Pattern Discovery
Human activity understanding includes activity recognition and behavioral pattern discovery. Activity recognition focuses on precise detection of the human activities based on a predefined activity model. But, the behavioral pattern discovery aims to find some unknown behavior from sensor data without any predefined models. In a remote monitoring, the system needs to gather massive information on several daily behavior patterns.
Through human behavior the system can detect anomaly and provides personalized services due to the discovery of the preference and habits of the users. To know the behavioral pattern of the user, we should use various sensors. Some sensors provide information about user activities, other pro- vide information about the environment of the user and other type of sensors provide user vital signs.
1.3.3 Anomaly Detection
The main task of smart health environment is to detect abnormal activity or emergency situation which requires intervention of caregivers. Anomaly detection is a critical problem in the field of health care and needs a high
14
degree of accuracy.
It has several benefits for smart environment such as: Enhancing life’s quality of older persons, promoting the difference between standard data, raw data and feedback control [40], to differentiate between normal and ab- normal elderly’s behavior. The purpose of anomaly detection is to detect activity that does not conform to habitual behavior. The non-conforming pattern is a deviation or an anomaly [41]. The goal of anomaly detection is translating data anomalies to significant and actionable information in a wide variety of application domains [42].
Anomaly detection is used in many applications such as fraud detection for credit card, health care, military surveillance for enemy activity, intrusion detection for cyber-security and detection in safety critical systems. Hence, many systems developed to detect anomaly in health care systems. For in- stance, RFID-based approach used to detect abnormal activities in a smart home [43].
Also, in [44] and [45], authors proposed several systems of anomaly de- tection for mild cognitive impairment. According to the rules defined by experts, the anomaly is detected as an activity containing a deviation from normal behavior. As well, in [46], authors use One Class Support Vector Machines (OCSVM) techniques to identify the behavior model of the resi- dent. In addition, there are different anomaly detection techniques used in health care services like Neural Networks [47], Rule Based System [48] and Nearest Neighbor technique [49].
1.3.4 Decision support systems
According to [50], Decision Support System (DSS) is \”an interactive, flexi- ble, and adaptable computer based information system, developed especially for better decision making as it supports the solution of a non structured management problem. It uses data which provides an easy-to-use interface, and allows for the decision maker’s own insights\”.
DSS have been broadly used in the healthcare system. It plays a prin- cipal role in the diagnosis and treatment of multiple diseases, improves the
15
safety of the elderly and the quality of medical care and provides accurate diagnosis that leads to make a quick decision on the anomaly detected [51]. For example, in [52] authors propose a decision support system that can detect renal cell cancer using abdominal images of healthy and renal cell cancer tissues, through the use of the SVM algorithm for the classification task. Also, in [53] the authors developed new discriminant temporal pattern mining algorithm which aims to identify the association between hospitaliza- tion for seizure and anti-epileptic drug to improve the work of medical staff and make their decision more efficient and accurate.
Current healthcare systems generate large volume of data because it is used every time, daily and everywhere. In this area, data mining is impor- tant to detect disease and identify the situation of the patient. It converts information to knowledge, supports decision makers to understand several situations and predict future events by the use of several algorithms.
1.3.5 Anonymization and Privacy Preserving Techniques
As ambient intelligence system is integrated into our everyday environment, more data is gathered about the user. These data need to be secure and private. Security and trust are the main issues in ambient intelligence visions [54]. Thus, protecting personal privacy is one of the most challenges of AmI in health care systems. There is a huge need to use some protection mechanisms as data anonymization. It modifies the data and makes difficult to identify the concerned person. The popular anonymization techniques are generalization and bucketization [55]. The main difference between these techniques is that bucketization does not change Quasi Identifiers.
1.4 Application
There are many fields in which ambient intelligence can maximize the com- fort of our life. This section presents AmI applications for health care.
1.4.1 Health Monitoring
The Europeans beyond the 60 years age represent more than 25% of the population. By 2040 over 11 million will suffer from Alzheimer disease [2]. Also, people over 85 years need routine check. In United State the cost of
16
health care is $2.4 trillion. These figures show that traditional health care services are inefficient to handle classical diseases we can prevent in advance. Thus, AmI for health monitoring may be an effective solution to control el- derly, disabled and people with chronic illness anytime, anywhere and in an unobtrusive way with low cost. With the development of supporting tech- nologies, elder people and people with disabilities can have an independent life in their home with the assistance of health monitoring. AmI methodolo- gies presented in previous section support this goal.
Various systems for monitoring health status of elders have been devel- oped such as the proactive group at Intel develop technology to enhance and facilitate life’s quality of the older persons [56]. In [57] authors propose new continuous ehealth monitoring system that predicts the future status and its deterioration before more complication. Also, in this field, there are other works use RFID, and environmental sensors [58][59].
1.4.2 Assisted Living
Ambient Assisted Living (AAL) [60] is an efficient solution which enables the elderly population to maintain independence for a longer time. It has been defined as a service that aims to form an intelligent environment to support the user in their work place and home. AAL can help people with various types of physical disability.
Many projects are developed or in developing stage in this field. For ex- ample, in [61] the authors proposed health smart home system for disabled person that can alert them about accidents that can occur in somewhere in the house by the use of low-cost technologies such as Raspberry Pi, cameras, motion sensors and the central computer. This system has the ability to call a health care provider and fire brigade in emergency situations.
Furthermore, in [62] authors proposed a novel obstacle detection system to assist the visually impaired using 3D sensors. Also, in [63] authors present mobile health care (mHealth) to compensate for any impaired movement of smart wheelchairs using emerging IoT technologies. In addition, in [64] authors described the development of VIVOCA project, which is intended to recognize and interpret an individual’s disordered speech and deliver the
17
required message in clear synthesized speech.
1.4.3 Smart hospital
In traditional hospitals the staff must navigate the hospital to gather infor- mation about patients and control their status. However, doctors and nurses have several parallel activities. Thus the control of the patient becomes more difficult and makes the service insufficient. AmI in hospital can enhance the quality of service, improve the security of patient and the personal of hospi- tal can monitor the situation of sufferers. Smart hospitals can promote the communication and support the staff and the patient.
For instance, \”Dr. Garcia is checking the patient in bed 234, his personal digital assistant (PDA) alerts him that a new message has arrived. His handheld displays a hospital floor map informing him that the X-ray results of patient in bed 225 are available. Before, Dr.Garcia visits this patient, he approaches the nearest public display that detects the physicians presence and provides him with a personalized view of the hospital information system. In particular, it shows a personalized floor map highlighting recent additions to clinical records of patients he is in charge of, messages addressed to him, and the services most relevant to his current work activities. Dr. Garcia selects the message on bed 225, which opens windows displaying the patients medical record and the X-ray image recently taken. Aware of the context of the situ- ation (patients original diagnosis, the fact that X-rays where just taken from the patients hand, etc.), the system automatically opens a window with the hospitals medical guide that relates to the patients current diagnosis, and an additional one with previous similar cases to support the physicians analysis. While Dr. Garcia is analyzing the X-ray image, he notices on the map that a resident physician is nearby and calls him up to show him this interesting clinical case. The resident physician notices that this is indeed a special case not considered by the medical guide and decides to make a short note on his handheld computer by linking both the X-ray image and the medical guide. He can use these links later on to study the case in more detail or discuss it with other colleagues from any computer within the hospital\” [65].
Going back to smart hospital work, such as activity recognition from the smart hospital. In [66] authors develop an approach for automatically
18
estimating hospital-staff activities by training Hidden Markov Model (HMM) to map contextual information in a user activity. In addition, GerAmI is another project developed for Alzheimer patient [67]. It is an intelligent environment using the multi agent system, mobile device, RFID and WiFi to facilitate the integration and the use of the system.
1.4.4 Therapy and Rehabilitation
According to the Disability and Rehabilitation Team at the World Health Organization (WHO), the number of people who need rehabilitation service is 1.5% of the world population [68]. AmI in therapy rehabilitation can help people who required remote systems [69]. It offers personalized, transparent service and used everywhere.
The use of AmI in the health care system, especially in rehabilitation is based on sensor networks [70]. In [71] authors present the system used in the ubiquitous rehabilitation center, which integrates a Zigbee-based wire- less network with sensors that monitor patient and rehabilitation machines. These sensors interface with Zigbee motes which in turn interface with a server application that manage all aspects of the rehabilitation center and allows rehabilitation specialists to assign prescriptions to patients.
In [72] authors, presented multi agent system which developed to provide people with Acquired Brain Injury (ABI) and with physical therapies for the rehabilitation by using specific devices to monitor the patient’s movements and some physiological responses, such as the variation of the heart rate, during the rehabilitation process.
Conclusion
Ambient intelligence enhances the quality of health care monitoring sys- tems. This chapter presented a survey about infrastructure, technologies and methodologies used in AmI for mobile health monitoring. We focused on the methodologies utilized to detect and monitor health status of patients and how Ambient Intelligence can greatly effect our lives.
19
Chapter 2
Research methodology
Contents
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.1 Dimensionality reduction techniques . . . . . . 21 2.2 Classificationmethods …………… 26
2.2.1 DecisionTree ……………….. 26
2.2.2 k-NearestNeighbors(KNN)………… 27
2.2.3 SupportVectorMachine(SVM). . . . . . . . . . 27
2.2.4 LogisticRegression…………….. 28
2.2.5 RandomForest ………………. 28
2.2.6 AdaBoost …………………. 28
2.2.7 ExtraTree…………………. 29
2.3 Ensemblemethodology…………… 29
2.4 Evaluationmetrics……………… 30
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
20
Introduction
Machine Learning has the ability to build computer programs that auto- matically enhance their performance by experience. It can be thought of as \”programming by example\”. So, machine learning algorithms may be the efficient solution to predict the situation of the patients. They can produce accurate predictions on new unknown data after being trained on a limited learning dataset. In other words, machine learning algorithms can learn how to infer whether the patients are in normal situation or not by generalizing from the target data. Note that in our research the target data consisted of patients in normal and abnormal situation.
Machine learning algorithms have two main categories which are most widely used:
1. Supervised learning: Is the most used forms of machine learning that use a training data set to make predictions. The training data set in- cludes attributes and class. From it, the supervised learning algorithms try to build a model that can predict the class of new data. They are grouped into regression and classification problems.
2. Unsupervised learning: In unsupervised learning, no target variables are given to the algorithm, only give input data. The objective of this algorithm is finding a structure in the inputs. There is the notion of distance and similarity between the inputs that can be used to build the models such as K-means algorithm.
This chapter introduces the methods and the main concept used in this research to build new intelligent ensemble. At first, we presented the dimen- sionality reduction technique in section 2.1. Then, classification methods and the ensemble methodology are presented respectively in sections 2.2 and 2.3. Then evaluation metrics are presented in section 2.4.
2.1 Dimensionality reduction techniques
Dimensionality reduction is an important method in machine learning. It is the process of reducing the dimension of the data set. In data sets with a very high number of attributes, dimensionality reduction techniques are a
21
must to produce an accurate model of classification in a reasonable time. The first reason is that the noise features make the result wrong. Also, reduction of the dimensionality imposes a few numbers of features that can be taken into account when creating a model. For example, the redundant features do not contribute additional information, their use may lead to degradation in performance of the learning algorithm . In addition, when we have a large number of features the hypothesis space will be larger and searching may take more time. Thus, dimensionality reduction techniques remove the redundant attributes, which help to keep the relevant features and makes the prediction of illness more quick and easy. It can be divided into two subcategories which are:
• Feature selection [73]: Does not modify the initial data, it removes un- necessary features and the features which have the highest correlation with the target variables. In this research we use univariate feature selection as feature selection method.
• Feature extraction [74]: Is the process of initial feature’s transforming to a new set of variables which are non-redundant, uncorrelated. As a result, we will no longer use the observed variables, but we will rebuild a new reduced-size space by manipulating the original variables.
In this research, we used Principal Component Analysis (PCA) as fea- ture’s extractor and univariate feature selection method as selecting features to assess the performance against deep autoencoders.
• Principal Component Analysis (PCA):
Principal Component Analysis (PCA) used to extract features, was devel- oped by Karl Pearson in 1901 [75] and Hotelling in 1993 [76]. It applies with our well-known matrix X, consisting of continuous numerical values, in which each individual is described by n variables. The ACP will make it possible to construct k variables from n, such that K<n. These k variables are the principal components of the data matrix. They are linear combina- tions of the n initial features, characterized by the amount of information which they return and order according to this quantity and independent of each other (no linear correlation). In addition, PCA works on quantitative features.
• Univariate Selection:
22
Univariate feature selection [77] used to select features that have the strongest relationship with the target variable. It is used for regression and classifica- tion. For the classification, there are chi2 and f_classif as scoring function. This function depends the feature’s nature; in our research, we used f_classif because our data set contains negative features.
• Deep learning and autoencoder:
Deep learning (DL) is a machine learning method based on the Artificial Neural Network (ANN). The main difference with the ANN is it eliminates the task of feature extraction for the handled data set. So, we will start by looking at the structure of a neural network.
ANN is a computing system, inspired by the structure and function of the brain. It is based on a set of connected neurons. The neuron receives a vector x as input and uses it to compute an output signal which is transferred to other neurons. It is parameterized through {W , b}, where W is a weight vector and b is the bias; refer to Figure 2.1. Also, the output of each neuron can be described as
n f xi·wi+b
.
k=0
Where f is referred to as an activation function.
b
w1
w2 Input_2 X2
f
Input_1 X1
Y=f(w1·X1 + w2·X2 + b)
Figure 2.1: A neuron in a neural network
These neurons are organized in successive layers and every layer takes as 23
input the output produced by the previous layer, except for the first layer, which consumes the input as shown Figure 2.2
There are three kinds of layers in all artificial neural network:
(a) Input layer: This layers store the input data, each neuron store xi
component of the observation x.
(b) Hidden layers: Is placed between input and output layers. It has the ability to process the data obtained by the input layer and transfer it to the output layer. Also, it can be more than one layer.
(c) Output layer: Is the last layer in the network. Thus, the output of this layers is the output of the network and is the prediction generated based on the input.
Input layer Hidden layer Output layer
Figure 2.2: Structure of a Neural Network
The number of layers is referred to as the depth of the network. This is where the notion of deep (as in deep learning) comes from. Based on the traditional ANNs, deep neural networks contain a great number of hidden layers, where in each level, the DL method learns to transform input data into a composite representation.
Recently, DL has become an efficient solution for machine health mon- itoring system. It is used to diagnose the Alzheimer’s disease [78]. In [79] authors apply deep learning algorithms to identify autism spectrum disor- ders (ASD) patients from the large brain imaging dataset, based on patient’s brain patterns.
24
DL models have different variants, like Autoencoder [80], Deep Belief Network [81], Deep Boltzmann Machines [82], Convolutional Neural Net- works [83] and Recurrent Neural Network [84]. In this research, we used autoencoder to extract features.
An autoencoder is one of the several architectures of deep learning algo- rithms [85]. It is a feedforward network trained to reproduce its input at the output layer which compress the original data to produce the code through the encoder, then the decoder decompress the code to reconstruct the output [86]. Thereby, it consists of two core segments placed back to back as shown in Fig. 2.
1. The encoder: Takes input data and map it to hidden representation, when this hidden layer less dimension than the input data, the encoder reduce the dimensionality.
2. The decoder: Uses the hidden representation to reconstruct the input.
Figure 2.3: Architecture of an autoencoder with three hidden layers.
25

2.2 Classification methods
Classification is a data mining task able to predict the value of a categorical variable (target or class) [87]. It takes some kind of input data and map it to a discrete label. When the classifier is trained accurately, it can generate correct output given a new instance. This is done by building a model using numerical and/or categorical features. If we have two classes we talk about binary classification if not we talk about multi classification. In our experiment we used dataset with two output (Normal/ Abnormal), so we used classification techniques to create the new intelligent ensemble. A brief introduction of the classification methods considered in our experiment is presented in the next subsection.
2.2.1 Decision Tree
The decision tree is a supervised algorithm and non-parametric. It can be used for both regression and classification problems. A decision tree algo- rithm approximates a target concept using a tree representation, where each internal node corresponds to an attribute, and every terminal node corre- sponds to a class [88]. A decision tree is a branching structure, which consists of nodes and leafs. The root node is at the top and leafs at the bottom. Each node tests the value of some feature of an instance, and each leaf assigns a target variable to the example.
The elements of a decision tree representation have the following meaning: 1. each internal node tests an attribute;
2. each branch corresponds to an attribute value;
3. each leaf node assigns a classification.
It is constructed by learning from a set of data, and is used to classify new instances. The most popular algorithm used in the decision trees are ID3 , C4.5, C5.0 and CART. In this research, we used CART which refer for classification and regression trees. The decision tree construct by CART algorithm is a binary decision tree (each node has exactly two child nodes). So, it can easily used with numerical and categorical features.
26
CART algorithm use Gini index as impurity measure [89]. Gini index is a measure for attribute selection that select which attribute would be placed at the root node or the internal node and the attribute with lower Gini index will be selected . For a dataset D composed of n classes
n
G j = 1 − p 2j
j
2.2.2 k-Nearest Neighbors (KNN)
KNN is a supervised and non-parametric algorithm [90]. It is lazy learn- ing algorithm (no explicit training phase until an unlabeled data pattern arrives). The idea is simple, KNN consist to suppose that an observation is similar to its neighbors.
It predicts the class of new input by calculating the distance between the new input and each training instance. The most distant observations being considered the least important. There are several distances metric used with KNN but the most common distance utilized is the Euclidean distance [91].
2.2.3 Support Vector Machine (SVM)
SVM is a powerful nonlinear binary classification algorithm which can be used for both regression and classification problems [92]. But, it is mostly used as classification technique. In many cases, it can replace neural network systems because it is less expensive in learning time.
It builds a non-linear hyperplane that separates two groups of cases with the maximum space and uses it to make predictions. Also, the distance be- tween the support vector and the hyperplane should be as far as possible.
SVM finds the closest two points from the two classes, that support the best separating plane then, it draws a line connecting them. After that, SVM decides that the best separating plane is the plane that bisects, and is perpendicular to the connecting line.
27
2.2.4 Logistic Regression
Logistic Regression (LR) is a supervised algorithm used for classification [93]. LR’s ability to provide probabilities and classify new examples using continuous and discrete measurement makes it one of the popular machine learning algorithms.
It is a linear algorithm, however the prediction is transformed by logistic function. This function produces a logistic curve, which is limited to values between 0 and 1.
2.2.5 Random Forest
Random Forest developed by Breiman [94], is a non-parametric algorithm can be used for regression and classification problems. The goal of Random Forest algorithm is to eliminate the disadvantage of decision trees, especially their vulnerability to overfitting.
It uses Bagging technique (Bootstrap Aggregation). In other word, It works as a large collection of uncorrelated decision trees and use them to make classification. Random forest has the ability to create new boot- strapped dataset (random subsets) from the initial dataset and for each subset it build decision tree. Thus, Random forest is an ensemble method which can predict new instance by aggregation (averaging for regression and majority vote for classification task).
2.2.6 AdaBoost
AdaBoost, short for \”Adaptive Boosting\”, focuses on classification problems. It is a boosting algorithm that can turn weak algorithms into strong one [95]. AdaBoost starts with uniform probability distribution on the given training instances, then it changes the distribution by increase the weight of misclassified instances in the next iteration to be used with the second learners. The final classification is done by combining the prediction of each learner by voting.
28
2.2.7 Extra Tree
Extra-Tree, short for \”Extremely randomized trees\”, is a supervised learning algorithm used for both regression or classification [97]. It combines ensemble of decision tree, but the major differences with other tree based ensemble methods are that it splits nodes by choosing cut-points fully at random and uses all learning sample (rather than a bootstrap replica) to grow the trees.
2.3 Ensemble methodology
Ensemble methodology is the combination of several learners for the same problem, in order to produce a more accurate predictive model compared to a single model by the decrease of variance and bias. It can be used for regression and classification task. Each learner may work well in different part of the data, no one good for many problems. But, by combining these learners it is possible to get very strong model.
As shown in Figure 2.4, the main idea of ensemble methodology is to combine the output (maybe same or different) of each learners. these learners can be trained in serial or in parallel and they learners can be generated using different algorithms or same algorithm with different parameters. Bagging [96] and Boosting [98] are ensemble machine learning algorithms that are mostly used.
Ensemble methodology has two principal phases:
1. Building the classifiers models with learning algorithm.
2. Voting: Combine the classifiers models using majority voting.
Voting method is generally used for classification problems [99]. It can be weighted or not. In non-weighted voting, all the classifiers have equal weight and the predicted class is the one that has the majority votes. In weighted case, we give different weights to the classifiers. There are several ways in which we can weight the classifiers such as assigning a weight proportional to the accuracy of each classifier.
In this research we constructed new intelligent ensemble using classifiers which are Random Forest, Decision Tree and Extra Tree, where the dimen-
29
Train
Test
learners_1 learners_2 learners_n
yyy
Y
Figure 2.4: Ensemble methodology using voting method.
sionality reduction technique is based on deep learning based on majority voting method.
2.4 Evaluation metrics
The main purpose of the evaluation metrics is to quantify the model perfor- mance and to compare classification results against others. The are different model evaluation procedures as training and testing on the same data, k-fold cross validation and train/test split. In this research we used train/test split.
The following are the main idea of train/test split:
• Split the data set into two groups, so that the model can be trained and tested on different data.
• Better estimate of out-of-sample performance.
• Useful due its speed, simplicity and flexibility.
In addition, the evaluation metrics available for binary classification are accuracy, precision, recall, f-measure, loss and ROC curves. To calculate these metrics we need to use confusion matrix.
30
Model_1
Model_2
Model_n
Voting

Negative
Positive
Actual class
Predicted class
Table 2.1: Confusion matrix.
Negative
True Negative (TN)
False Positive (FP)
Positive
False Negative (FN)
True Positive (TP)
In Table 2.1 there are four important terms:
• True Positive (TP): Predicted to be positive and the actual value is positive.
• False Positive (FP): Predicted to be positive but the actual value is negative.
• True Negative (TN): Predicted to be negative and the actual value is negative.
• False Negative (FN): Predicted to be negative but the actual value is positive.
Accuracy is the propotion of correct classification (TP and TN) from overall number of cases.
TP+TN TP+FP+FN+TN
Precision is the proportion of correct positive classifications (TP) from cases that are actually positive.
TP TP+FP
Recall is the proportion of correct positive classifications (TP) from cases that are predicted as positive.
TP TP+FN
F-measure is the harmonic mean of precision and recall. The higher the f-measure, the performance of classification is better.
P recision+Recall 2∗(P recision∗Recall)
31

ROC curve, short for \”Receiver Operating Characteristic\” used to visu- alize the performance of binary classification. The terms used in this metric are:
• Sensitivity: proportion of actual positive, which predicted positive. TP
TP+FN
• specificity: proportion of actual negative, which are predicted negative.
TN TN+FP
Loss is a classification function, quantifies the accuracy of a classifier by penalizing false classifications. A perfect model have a log loss equal to 0.
−(ylog(p)+(1−y)log(1−p))
Conclusion
In this chapter, we presented the main dimensionality reduction and classifi- cation methods used to assess the performance of these methods to create a new intelligent ensemble which can better predict the situation of the patient.
These methods will be exploited in the following chapter, which illus- trates our new intelligent ensemble to efficiently predict the situation of patient.
32
Chapter 3
Experimentation and Results
Contents
Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . 34 3.1 ExperimentalResults……………. 34 3.1.1 Datasetdescription…………….. 35 3.1.2 Attributes selection/extraction . . . . . . . . . . 35 3.1.3 Classification ……………….. 37 3.1.4 Newintelligentensemble………….. 39 3.2 Discussion…………………… 40 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
33
Introduction
The general objective of our work is to develop new ensemble method using patient’s sensor data that combine the best classifiers presented in this re- search project to predict the situation of the patient as normal or abnormal. In the process of addressing this objective, three main phases of experiments were used.
Firstly, is assess the performance of deep autoencoder against traditional dimensionality reduction techniques which are PCA and univariate selection method. The comparison considers the effect of these techniques on the pre- diction accuracy of several classifiers. Second, we compare the performance of classifiers with the best reduction methods obtained in the first phase us- ing the voting method.
As indicated in research methodology chapter, the classifiers used in this experiment are: Decision Tree, KNN, SVM, Logistic Regression, Random Forest, AdaBoost and Extra Tree. In the last phase we construct a new intelligent ensemble by combining the best classifiers using in the previous phase. The evaluation based on different performance metrics which are accuracy, precision, recall, F-measure, sensitivity, specificity, loss, ROC area and time required to build the model.
3.1 Experimental Results
In these experiments we use scikit-learn, Keras libraries and python as a programming language. Scikit-learn libraries cover all data mining steps, from loading, preprocessing to mining data. It also includes a vast collection of machine learning algorithms and most of them with minimal code adjust- ments.
For autoencoder we used keras with the tensorflow framework as a back- end. It is a high level neural networks API, used in python. It has the ability to build any deep learning model. Also, when using Keras, you do not have compatibility problems, because new layer add to your model are dynamically built to match the shape of the incoming layer. As said earlier, all these libraries are used to produce an efficient classifier for a health care
34
real data set.
3.1.1 Dataset description
The dataset used in this research constructed in previous Phd project. It was obtained from wearable sensors which fixed in Baraha medical city in Sham- bat, Khartoum North, Sudan. Each patient was attached to several sensors. These patients suffer from chronic ill. This dataset consists of 745 instances, 300 features and a label, which is either Normal or Abnormal which explains the situation of the patient in the hospital.
All the values of features are numeric data and we just have changed the target attribute from nominal to numeric data, where 1 specifies \”Abnormal\” and 0 specifies \”Normal\”. Also, in this dataset there is no noise and no missing data. As we have 300 attributes we need a dimensionality reduction techniques to make their processing easy.
3.1.2 Attributes selection/extraction
The purpose of this section is to assess the competitiveness of deep au- toencoder against traditional dimensionality reduction algorithms, which are PCA and univariate features selection methods, using the same initial dataset presented in the previous section. It is important to note that the number of output features with these methods is the same with the deep autoencoder algorithm which is 20 features. To do so, the comparison is based on classifiers accuracy, which are presented in the section 2.2.
The architecture of the deep autoencoder layers used in this work, in- cludes input layer, 3 dense hidden layers and output layer. The parameters of the autoencoder layers were as follows: the input layer has the original features of our dataset, the second layer with 100 neurons, the third one with 70 neurons, the last hidden layer with 40 neurons and the output layer with 20 neurons. The activation function is ReLU, and the kernel initializer is uniform.
The results introduced in Table 3.1, show the accuracy obtained for our different classifiers with the various dimensions reduction methods. The best performance was demonstrated by the Random Forest using autoencoder. In addition, the classifiers using autoencoder have better results than the PCA
35
and the univariate features selection method in 5 out of 7 cases. Figure 3.1 optimizes the visualization of the results.
95
90
85
80
75
70
Figure 3.1: Performance of classification methods with feature selec- tion/extraction methods
93.28
1.27 91.9217.27 91.94
9
8
8
7
5
8
RF
DT Extra treeKNN
6.57
85.9 85.9
90.6
88.59 87.24
88.5987.91 85.9
83.22
77.1 75.16
73.1 1.81
LR SVM AdaBoost
2.55
78.52
Autoencoder PCA Univariate feature selection
Classifiers
Accuracy
Autoencoder PCA
Univariate 85.90% 86.57% 85.90%
93.28% 91.27% 91.94% 88.59% 75.16% 73.15% 88.59%
91.27% 91.27% 90.60% 87.24% 71.81% 77.18% 85.90%
Random Forest
Decision tree
Extra tree
KNN 82.55% Logistic Regression 78.52% SVM 83.22% AdaBoost 87.91%
Table 3.1: Classification accuracy with Deep Autoencoder, PCA and uni- variate feature selection method
36
Accuracy
Method
Univariate
PCA
Autoencoder 91.171994
Table 3.2: Execution time of each method
3.1.3 Classification
This section consists two phases. The first phase is to compare the per- formance of different classifiers using the features generated through deep autoencoder based on train/test as evaluation model. We trained 80% of the dataset and we used the rest for the test. Table 3.3 summarizes the different parameters of each classifier.
Time
0.024395 0.023643
Classifiers
Random Forest
Decision tree
Extra tree
KNN
Logisitic regres- sion
SVM
AdaBoost
Parameters
Number of the tree in the forest = 10; the func-
tion that measure the quality of a split = ’gini’; the
number of features to consider when looking for the
best split =

numberoffeature
The function that measure the quality of a split =
gini; the strategy used to choose the split at each
node is \”best\”
Number of tree in the forest = 10; the function that
measure the quality of a split =’gini’; the number
features to consider when looking for the best split

=
Number of neighbor = 5; weight function = uni- form; algorithm used to compute the nearset neigh- bor = ’auto’
C= 1.0; solver = liblinear; tolerance for stoping cri- teria = 0.0001
The kernel type used in the algorithm is ’rbf’; C= 1.0
Base_estimator = Decision tree classifier; the num- ber of estimators= 50
numberof f eature
Table 3.3: The list of classifiers used and their parameters
Table 3.4 represent the accuracy and loss for each classifier. It is inferred from this table that Random Forest, Decision Tree and Extra Tree have better accuracy. Also, Random Forest has the lowest loss and Decision Tree has the highest loss in comparison with to other classifiers. But, accuracy is
37
not enough to select the best classifiers, we need other metrics to evaluate the performance of the model.
Classifiers Loss
Accuracy
0.9328 0.9194 09127 0.8859 0.8859 0.7516 0.7315
Random Forest
Extra tree
Decision tree
KNN 1.5428 AdaBoost 0.6203 Logistic Regression 0.5210 SVM 0.5683
Table 3.4: Classifiers performance in terms of accuracy and loss
Tale 3.5 depicts the time required to build the model for each classifiers. This table shows that Logistic Regression takes lowest time to construct the model than the others.
0.3943 1.4641 3.0134
Classifiers
Training time (in seconds) 0.0631
0.0286
0.0473
Random Forest
Decision tree
Extra tree
KNN 0.0314 Logistic Regression 0.0214 SVM 0.0993 AdaBoost 0.1795
Table 3.5: Classifiers performance in terms of training time
Table 3.6 shows the performance of each classifier in term of precision, recall and f-measure. According to this table, Random Forest, Decision Tree and Extra Tree have the highest precision and recall.
As can be observed in Table 3.7, Random Forest, Extra Tree and Decision have the highest specificity. But in terms of sensitivity the Random Forest is the highest and KNN is the second one. In addition, Random Forest and Extra Tree have the ability to distinguish between the situation of the patient more than other methods.
38
Precision
Recall
0.9323 0.9117 0.9188 0.8938 0.7608 0.7352 0.8866
0.9323 0.9126 0.9188 0.8797 0.7408 0.7224 0.8932
Classifiers
Random Forest
Decision tree
Extra tree
KNN 0.8833 Logistic Regression 0.7425 SVM 0.7237 AdaBoost 0.8945
Table 3.6: Classifiers performance in terms of precision, recall, f-measure
F-measure
0.9323 0.9121 0.9188
Sensitivity
Specificity
0.9264 0.9117 0.8985 0.9166 0.8923 0.7924 0.75
0.9382 0.9259 0.925 0.8651 0.8845 0.7291 0.720
Classifiers
Random Forest
Extra tree
Decision tree
KNN
AdaBoost
Logistic Regression 0.7291
SVM
0.720
ROC Area
0.932 0.9313 0.919 0.880 0.8809
Table 3.7: Classifiers performance in terms of sensitivity, specificity and ROC area
3.1.4 New intelligent ensemble
As presented in the section 2.3, the main idea of an ensemble methodology is to combine several classifiers, which use the same dataset, to get bet- ter prediction. To combine ensemble’s output we used the majority voting method.
In this section, we construct our ensemble by the combination of three best classifiers using to monitor the situation of the patients and predict their situation. This experiment is conducted using the same dataset, which is presented in the dataset description section, and the features generated by deep autoencoder.
According to the result of the previous section, we use Random Forest, Extra Tree and Decision Tree to build our intelligent ensemble.
As evident in Table 3.9, Table 3.8 and Table 3.10, the new intelligent ensemble is able to offer a better decision that is more accurate than the other machine learning classifiers. Also, our ensemble distinguish better between the patient in normal situation or in abnormal situation as shown
39
in figure 3.10. But it is lazy compared with Decision Tree, Extra Tree and Random Forest and it has high loss value.
Accuracy Time
94.63% 0.0464s
Table 3.8: Performance of the new ensemble in term of accuracy, loss and training time
Precision F-measure
0.95 0.95
Table 3.9: Performance of the new ensemble in term of precision, recall and f-measure
Sensitivity ROC Area
0.9545 0.94
Table 3.10: Performance of the new ensemble in term of sensitivity, specificity and ROC Area
3.2 Discussion
Based on these results, we can interpret that the features obtained by the deep autoencoder algorithm provides more significant and relevant informa- tion than those obtained by others methods. However, deep autoencoder spends more time generating new features than traditional feature reduction methods. Thus, we can confirm the vital role of deep learning to accurate the classification of the patient’s situation and in health care system in general.
In the second phase of this work, we have noticed that the worst classifiers with deep autoencoder are SVM, kNN, AdaBoost, Logistic regression. So, we can infer that the modern classifiers are better with autoencoder than the traditional methods especially SVM and Logistic regression. Also we can notice that Random Forest works better with deep learning comparing with the rest of individual classifiers. In addition, the new proposed ensemble method has the best results in term of the evaluation metrics except loss and training time. According to this result, we note importance of the ensemble methodology to improve the prediction of the classification task.
40
Loss
1.8544
Recall
0.95
Specificity
0.9397

(a) Random Forest (b) Extra Tree (c) Decision Tree
(d) KNN (e) SVM (f) AdaBoost
(g) Logistic Regression
(h) New ensemble
Figure 3.2: ROC curve of the classifiers and the new intelligent ensemble
41

In conclusion, the new intelligent ensemble with deep autoencoder is an appropriate method to classify the situation of patient hospital based on vital signs from wearable sensors.
Conclusion
In this chapter we compared the performance of deep autoencoder with tra- ditional dimensionality reduction techniques. Then, we combined best three classifiers to get new ensemble more accurate using the features generated by deep learning. Through these experiments we can infer that an ensemble methodology with deep learning is a good method to improve the classifica- tion performance in health care system.
42
Conclusion
In this research, we are interested in the field of ambient intelligence for healthcare. We presented the infrastructure, the technologies and the method- ologies used in AmI for mobile health monitoring. We focused on how Ambi- ent Intelligence can greatly impact our lives and the methodologies utilized to detect and monitor health status of patients.
In this context a new ensemble algorithm has been proposed. This algo- rithm is based on three performing classifiers, among the classifiers presented in this work, using deep autoencoder which generates higher-level features and replaces the initial dataset.
To do so, firstly, we choose deep autoencoder structure that works better then we compared its performance against traditional dimensionality reduc- tion methods. This comparison considers the effect of the selected/extracted features on the prediction accuracy of several classifiers. The results of this experiment show that deep autoencoder algorithm improve considerbaly the prediction for health monitoring system.
Secondly, a comparison has been made between several classifiers and we noted that Random Forest, Extra Tree and Decision Tree outperformed other classifiers. This comparison uses several performance metrics to eval- uate the classifiers used in this research.
Finally, we proposed new ensemble using the best three classifiers ob- tained in the previous phase. This ensemble improves the performance in classification and distinguish better the patient’s situation than individual classifier.