Automatic Hypothesis Generation for Network Growth Models - Computational social science at Centre Marc Bloch

Computer Science: Two possible internships

Context

Networks have become a fundamental abstraction for modeling systems across many scientific fields. Plausible hypothesis describing their growth processes can help us understand a wide range of phenomena, but formulating such hypothesis is often challenging and requires insights that may be counter-intuitive. In the last years, we have developed an approach to automatically discover realistic network growth models from empirical data, employing a machine learning technique inspired by natural selection, and defining a unified formalism to describe such models as a mathematical function of arbitrary complexity [1]. As the proposed method is completely general and does not assume any pre-existing models, it can be applied “out of the box” to any given network. By automating hypothesis generation and validation, this research is aligned with the ambitious idea of creating Artificial Scientists. We have released an open source tool [2], recently ported to Python, to make this method easily accessible to the scientific community.

Goals

There are two main areas of improvement at the moment in this context:

The first one is related to the efficiency and speed of the search that this tool achieves, and which is probably one of the main barriers for a more general adoption of this scientific instrument, especially for larger networks. There are a number of opportunities for improving performance and scalability. In this respect the internship would focus on proposing algorithmic improvements targeting speed and scalability; performing rigorous tests of these proposals, both in terms of performance and correctness; applying viable improvements to the open source tool.
The other one corresponds to a categorization issue: similar or equivalent generators can be described by different mathematical functions, which are expressed as formal trees combining mathematical operators and constants. Automatically detecting groups of relatively similar functions is a highly desirable improvement to the tool which would demonstrate the existence of families of fundamental network generative processes. This issue is at the interface between computer science, network science and applied mathematics.

Both topics put much more importance on the improvement of the scientific concepts underlying each task rather than the more low-level issues regarding the pure optimization of the code itself.

Intended audience

We open up to two internships. Candidates should be achieving a Masters Degree in Computer Science and related fields. Beyond a sufficient knowledge in Computer Science, desirable skills include in particular : algorithmic complexity and graph theory, network science, machine learning (especially evolutionary computation / genetic programming), as well as python and its common scientific libraries. Ability to innovate autonomously is expected.

References

[1] Menezes, T. and Roth, C., 2014. Symbolic regression of generative network models. Scientific reports, 4, p.6284. https://www.nature.com/articles/srep06284

[2] https://github.com/telmomenezes/synthetic

Practical Details

The internship could last between three and six months;
The internship is based in Berlin at the Centre Marc Bloch;
The intern should have working proficiency in either English, French or German;
The internship allowance is fixed by law at an amount of about 500 euros on a 38 hours basis.
Students registered at non-EU universities should also inquire first about the administrative issues related to the possibility of being hosted at the Centre, at a Germany-based institution.

Apply

To apply, please send an e-mail along with your resume to Camille Roth (roth[@]cmb.hu-berlin.de) and Jérémie Poiroux (poiroux[@]cmb.hu-berlin.de).