Research Paper by Associate Professor Ma Huanfei published in PNAS

On October 8, PNAS, an internationally famous comprehensive scientific journal, published online a paper titled “Randomly Distributed Embedding Making Short-term High-dimensional Data Predictable”. It was the latest paper (submitted through free channel) written by the research team led by associate professor Ma Huanfei from the Research Institute of Systematic Biology of Soochow University (mathematics). Ma was the first writer. PNAS, Cell, Nature, and Science are generally regarded as the world’s four famous scientific journals. This is the first faculties from Soochow University have published their paper in mathematics.

Their research result pointed out new randomly embedding theory and methods based on non-liner dynamics—Randomly Distributed Embedding (RDE). By constructing large numbers of randomly low dimensional embedding mappings, the distribution of predicted value of target variants will be achieved, and then it will become possible to predict the data of high-dimensional and short-sequence time series. They have established new theory and methods of anticipating the dynamic movement of target variants through high-dimensional data observed in a short span.

When analyzing time series, researchers generally realized that with large numbers of time samples (data of time series) of low-dimensional system rather than short-time samples, the reconstruction or prediction of the system was viable. However in the Big Data age, we usually gain a lot of variants and limited time samples (such as image data or omics data) while studying complicated systems. On the one hand, high-dimensional variants accelerate the increase of needed parameters for systematic fitting; on the other hand, statistical law of behavior of system dynamics usually can’t be gained through relatively short time domain samples, which invites new challenges in methods of analyzing data.

Picture 1: Although learning data is part of samples of attractors, RDE can predict those dynamic movements which haven't been learned. 

Enlarged picture: the process of prediction basedon distribution.

In order to solve the problem, Ma' and his team members, adopting embedding theory of nonlinear dynamic system, designed a new predicting structure of complicated system. They build weak predictors by applying many low-dimensional embedding mappings, then, they build strong predictors based on weak predictors, thus avoiding dimensional disasters. They also build dynamic information of target variants by adopting interaction among different variants in the high-dimensional system, then compensate for the shortage of short-time samples. Their search provides viable analysis of that structure from theoretical levels. They further prove the method viable and supreme by predicting real data such as air pollution and illness data.

Their research offers a new concept and theory on analysis of big data especially the analysis of high-dimensional short sequence time series data. Not only can it predict time sequences, but it can also help to construct big sample data of artificial intelligence and brain science.

Professor Ma, a doctor graduated from Fudan University, is working at School of Mathematical Science of Soochow University. In 2012, he went to Tokyo University to further post-doctoral research. In recent years, his research has focused on nonlinear science and systematic biology. Moreover, he has made great achievements in many aspects such as causality test and prediction of data. His new research has received fund from National Natural Science Foundation.

Paper Information:“Randomly distributed embedding making short-term high-dimensional data predictableHuanfei Ma, SiyangLeng, Kazuyuki Aihara, Wei Lin, Luonan ChenProceedings of the National Academy of Sciences, Oct 2018, 201802987; DOI: 10.1073/pnas.1802987115

Paper link