If you ever come across The Springfield Files episode, you may find the scene where Homer gets his lie detector test very amusing. But how actually does the lie detector (aka Polygraph test) work?

This is pretty much an accurate description of how EEG works.

As the video above, the signal output of the machine looks random. You won’t decipher the real message hidden deeply in the signal unless you are a machine.

Biosignal: Where does it come from, and what is it?

Remember the popular game that the army uses a sort of exoskeleton suit to have superhuman ability? That is a terrific way to describe the topic of my research. 😀 On this occasion, I will tell you guys about how we researchers work with electromyography (EMG) and electroencephalography (EEG) to recognize our body’s subtle electric signal. This kind of biosignal can be acquired by attaching special sensors directly onto our skin or embedding it inside our body. The sensors are placed around the skin rich in muscular activity in the EMG trial. For EEG, the sensors are placed near where the brain exists.

Our body uses small electricity to send signals from our brain to every part of our bodies. The sensors picked them up and amplified the signal. Later various filtering methods were employed to remove signal noise. At last, the noise-free signals are sampled and sent to the computer. The bad news is while the signal is rich in information, the relationship between specific signals to the specific body action is relatively unknown. This hidden relationship is the hotpot of neuromuscular research. Furthermore, the biosignal data stream seems random and not stationary. While the biosignal data stream seems messed up, it can command our body precisely and simultaneously, like moving my finger while typing this paragraph. Therefore, to uncover the relationship between biosignal and body action, we need to extract the information from the signal. If we can “translate” the signal to another form that the machine learning algorithm can differentiate well, then we are done here.

Below is an example of how biosignal looks in real-time (Fig. 1.). The data was sampled from currently discontinued Myo Armbands at 200 Hz and publicly available 1. Fig. 1. only features one channel (data stream), while the actual data contains 16 channels.

Fig. 1. An EMG signal channel sampled from Myo Armbands.

So there are two problems to tackle here: 1) extract the information from signal 2) use the information as input of machine learning algorithm. Today, I want to discuss the former only and save the latter for another day.

The process of extracting the information from raw data, usually called “feature extraction.” And we are going to discuss several popular feature extraction methods for biosignal.

Feature Extraction

Feature extraction is unavoidable pre-process step in most inference systems. It is just like us humans when it comes to recognizing a person. As we gaze at a person’s face, we instantly “examine” their facial features such as eyebrows, hairs, nose, even freckles. Then unconsciously compare those features with something like a face “database” inside our head, then determine if the person is a friend or foe.

The same thing goes with how a machine learning model works. The difference is the feature should be carefully hand-engineered; if not, the feature may be unable to create a distinct structure to categorize the input data. It is precisely like how useless counting nose hairs is to categorize people.

Fig. 1. Police lineup cartoon by Web Donuts

Biosignal as another signal usually can be presented by waveform (time-domain) and spectrum (frequency-domain). Therefore, feature extraction can be derived from both forms. But before extracting the feature, the non-stationarity of the signal could bring problems.

Windowing (or epoching) is a procedure to extract specific time windows from continuous data.

I don’t really understand the difference between window and epoch! But when I was using the MNE package, the explanation and the code makes me think that the epoch is equal-length data in respect of the event. Epochs are extracted around stimulus events or subject responses. For example, the epochs are extracted on every onset.

Biosignal feature in the time domain

Download the csv file here.

import pandas as pd
import plotly.express as px

df = pd.read_csv('https://antardata.com/posts/2021/12/how-signal-feature-looks-like/Index_DIP.csv')
df = df['emg.7']
df = df[5000:5200]
fig = px.line(df, x=df.index, y="emg.7", title='Sensor reading of  the armband')
Fig. 1. A one-second window of signal. Notice that the size is 200 to match the sample rate.

Root Mean Square (RMS)

$$ \begin{equation} \label{eq:rms} RMS = \sqrt{\frac{1}{N} \sum_{k=1}^N x_k^2} \end{equation} $$

import numpy as np
x = df.values
result = np.sqrt(np.mean(np.power(x,2)))
print('RMS: ', result)

Mean Absolute Value (MAV)

$$ \begin{equation} MAV = \frac{1}{N} \sum_{k=1}^N |x_k| \end{equation} $$

result = np.mean(np.abs(x))
print('MAV: ', result)

Integrated Absolute Value (IAV)

$$ \begin{equation} IAV = \sum_{k=1}^N |x_k| \end{equation} $$

result = np.sum(np.abs(x))
print('IAV:', result)

Waveform Length (WFL)

$$ \begin{equation} WFL = \sum_{k=1}^N |x_k-x_{k+1}| \end{equation} $$

result = np.sum(np.abs(np.diff(x)))
print('WFL:', result)

Zero-Crossing (ZC)

As far as I can remember, the first paper mentioned Zero Crossing is written by Hudgins et al 2. But here is the formula of ZC mentioned by Phinyomark et al 3

$$ \begin{align} \label{eq:zc} \begin{aligned} ZC &= \sum_{k=1}^N [f(x_k \times x_{k+1}) \land |x_k - x_{k+1}| \geq \epsilon]] \newline \text{where:} \newline f(x) &=\begin{cases}1,& \text{if } x \geq 0 \newline 0, &\text{otherwise}\end{cases} \newline \epsilon &= \text{treshold} \end{aligned} \end{align} $$

e = 0
r1 = (x[1:]*x[:-1])>=0
r2 = np.abs((x[1:]-x[:-1]))>=0
result = (r1 ^ r2).sum()
print('ZC:', result)

Slope Sign Change (SSC)

$$ \begin{align} \label{eq:ssc} \begin{aligned} SSC &= \sum_{i=2}^{N-1}[f[(x_i-x_{i-1})\times(x_i-x_{i+1})]] \newline \text{where:} \newline f(x) &=\begin{cases}1, &\text{if }x \geq \text{threshold}\newline0, &\text{otherwise}\end{cases} \end{aligned} \end{align} $$

result = ((x[1:]-x[:-1])*(x[:-1]-x[1:])>=e).sum()
print('SSC:', result)

  1. S. Pizzolato, L. Tagliapietra, M. Cognolato, M. Reggiani, H. Müller, and M. Atzori, “Comparison of six electromyography acquisition setups on hand movement classification tasks,” PLoS One, vol. 12, no. 10, p. e0186132, Oct. 2017, doi: 10.1371/journal.pone.0186132. ↩︎

  2. B. Hudgins, P. Parker, and R. N. Scott, “A new strategy for multifunction myoelectric control,” IEEE Trans. Biomed. Eng., vol. 40, no. 1, pp. 82–94, 1993, doi: 10.1109/10.204774. ↩︎

  3. A. Phinyomark, P. Phukpattaranont, and C. Limsakul, “Feature reduction and selection for EMG signal classification,” Expert Syst. Appl., vol. 39, no. 8, pp. 7420–7431, Jun. 2012, doi: 10.1016/j.eswa.2012.01.102. ↩︎