Scientists from ITMO’s Faculty of Physics have developed MetaDiT, a generative model for automated design of metasurfaces – superthin optical elements that control light at the nanolevel. The model selects not just the nanostructure’s pattern in an elementary cell but also the key numeric parameters: a metaatom’s thickness, its refraction index, and lattice period – the distance between neighboring elements. With this approach, it’s possible to search for more accurate and versatile designs of flat optics for VR devices, sensors, medical equipment, and photon chips. This project was completed at the joint research center in Qingdao, while the corresponding paper has been accepted to AAAI, one of the leading conferences on AI.

Credit: spopov / photogenica.ru
Metasurfaces are flat optical elements whose surface is made up of many nanoscale structures. These structures determine how light will pass through or reflect from the material: they can focus a beam and change its direction, polarization, or spectral composition. In the future, such elements could replace or complement bulky assemblies of lenses, mirrors, and filters in cameras, VR headsets, and medical equipment, making those devices lighter, more compact, and more functional. However, designing metasurfaces manually is extremely difficult. Their optical properties depend on many parameters at once: the shape of the nanoelement, its thickness, the optical properties of the material, and the spacing between neighboring elements. At the same time, an engineer needs to achieve a precisely specified light response – for example, to transmit only a particular wavelength or to focus a beam without distortion. Even a small change in geometry can cause the light to behave differently than intended.
Until recently, scientists relied largely on physical intuition, experience, and general understanding of how light interacts with nanostructures. A researcher would propose a possible metasurface geometry based on known physical principles, then run computer simulations (calculating how that structure would interact with light), and compare the result with the target. If the calculated spectrum did not match the desired one, the cycle was repeated. This process took a long time and depended heavily on the researcher’s skills. Early AI assistants sped up the selection process, but at the cost of quality: some parameters (thickness, material) were set manually, and the spectrum – the response curve across different light frequencies – was intentionally coarsened to make calculations faster. As a result, the model could miss the finer details of the spectrum.
“Such details are crucial for a device to function properly. The lens has to correctly focus the light in the target color, the filter has to let through some wavelengths and suppress others, while the sensor can be sensitive to a narrow resonance peak – a sharp increase in response at a specific frequency. If the spectrum is oversimplified during the training of an AI model, it can miss such details and suggest a structure that looks correct at a rude approximation but works worse in practice. That’s why these ‘fine details’ in the spectrum are so important,” says Andrey Bogdanov, the head of the MetaDiT development team and a senior researcher at ITMO’s Faculty of Physics.

Andrey Bogdanov. Photo courtesy of the subject
The new solution by ITMO researchers solves this issue. MetaDiT (Metasurface Diffusion Transformer) generates a 2D visualization of a nanostructure, as well as its physical parameters – thickness, period, and refraction index. In its work, the model uses a detailed description of the target spectrum. The process is two-fold: first, MetaDiT “reads” a plot illustrating how the future surface needs to interact with the light, including all the peaks and valleys in the reflection and transmission spectra; next, using this description, the model assembles the nanostructure. The assembly starts from a chaotic initial template – a random set of pixels – that gradually transforms into a precise pattern that produces the desired optical response.

A visualization of MetaDiT's work: the AI model processes the target spectrum and generates a metasurface's structure, and the resulting optical element controls the light in the needed way. Image courtesy of the researchers
“For metasurfaces, both the overall picture of the spectrum and its fine details are equally important. That’s why we introduced two control levels into the architecture. The first, approximate one, sets the model’s overall task: where in the spectrum light should transmit well and where it should not. The second, fine one, helps keep track of the local features – narrow resonances, peaks, and dips that often determine whether the device will work. Separately, the overall picture and the details do not provide full accuracy: in the first case, the model risks missing important features; in the second, it can lose the overall context. By combining both levels, we help the model retain all information. That’s why MetaDiT is not an ‘artist’ filling the blanks in a template, but an ‘engineer’ designing a structure for a specific physical task,” adds Andrey Bogdanov.
In comparative tests with earlier models, MetaDiT showed a smaller error rate in reproducing target spectra. The researchers evaluated not only the mean deviation across the entire curve but also the accumulated error along the frequency axis – that is, how accurately the model hits each point of the spectrum. Moreover, MetaDiT is more stable: in subsequent generations of the same task, the result deviation is minimal.
The model generates a single elementary metasurface cell – a microscopic building block smaller than the wavelength of light. To create a working device, such as a flat lens or a filter, these cells are assembled into a two-dimensional array, like a puzzle. Different points on the device require cells with different optical properties, and the model can generate them to meet specific requirements.
Developed over the course of 18 months at the joint research center at Harbin Engineering University in Qingdao (China) within ITMO’s initiative in integral photonics, the project brought together specialists in photonics, metamaterials, and machine learning. The corresponding paper has been accepted to the upcoming AAAI conference in Singapore, one of the most prestigious global conferences in AI.
Next on the research team’s agenda is teaching the model to take into account the real production conditions: the minimal possible element sizes and sensitivity to defects in production. After that, the researchers plan to move from generating individual cells to designing complete devices – flat lenses for cameras, optical sensors, and chips for integrated photonics, where light is used to transmit and process information inside microchips. The final step will be to construct the structures in the lab and verify their real optical properties.