Time | Network Management | Legislation and Regulation |
---|---|---|
09:30 | Registration | |
10:00 | WiFi at the Physical Layer - How do 802.11 Protocols Work?Tomáš Kirnak | NetCore (Unimus) This lecture covers the basics of WiFi at the physical layer. It will delve into physics, modulation schemes, the development of 802.11 family protocols, and their historical evolution. This session provides an essential overview of WiFi operation at Layer 1, preparing attendees for deeper exploration. PDF | Video | |
11:30 | Analyzing network reliability up to 800GThomas Weible, Gerhard Stein | Flexoptix This presentation investigates the proximity to a low Signal-to-Noise Ratio (SNR) threshold that can still maintain a tolerable Bit Error Rate (BER) in 100G / 400G / 800G network links. Additionally, we account for factors such as temperature and cable length to predict the duration for which a reliable network connection can be sustained between transceivers. The analysis, based on data retrieved using a Flexbox, focuses on comparing the reliability of coherent (16QAM) and non-coherent (PAM4) transceivers, with a detailed discussion on the implications of these technologies on network performance. PDF | Video | |
12:00 | Lunch | |
13:20 | Welcome | |
13:30 | Protecting BGP with TCP-AOKateřina Kubecová | CZ.NIC Securing BGP with TCP-AO option. How TCP-AO works, how it differs from MD5 and how to set it up. PDF | Video | Measurement Tools and Procedures for Monitoring the 5 GHz BandMiroslav Krýza | Český telekomunikační úřad The 5 GHz band is widely used for wireless communication, such as WiFi networks, access points, SRD, and other technologies. Therefore, the Czech Telecommunications Office intensively focuses on not only monitoring but also locating interference sources in this band using modern technologies and proprietary software tools. PDF | Video |
13:50 | Pushing the Limits III - Utilizing eBPF/XDP to Optimize the Performance of the Linux Kernel Networking SubsystemJan Kučera, Jan Viktorin | CESNET This lecture follows up on previous parts of the series with the same name. We will focus on the use of XDP to enhance the resilience of web servers against DDoS attacks and will explain how the real limits of the networking subsystem change when applying an XDP program to accelerate the native SYN cookies mechanism available in the kernel. PDF | Video | Detecting Child Sexual AbuseDavid Kovář | ÚSKPV The lecture will present selected aspects of online child sexual abuse from the perspective of the Criminal Police and Investigation Service of the Czech Republic. It will focus on detection methods and collaboration with other state authorities and private sector entities. PDF | Video |
14:10 | Deploying XDP in Knot DNSLukáš Vacek | CZ.NIC We have been discussing XDP technology in Knot DNS for a while, and how we use it in our anycast. Today, we’ll cover how you can deploy XDP in your setup: what to watch out for, prerequisites, configuration, and optimal traffic monitoring when packet inspection through the kernel is not an option. PDF | Video | Securing Email CommunicationJakub Onderka | NÚKIB In 2021, NÚKIB issued protective measures requiring email system administrators to implement security technologies such as SPF, DKIM, DMARC, and DANE. With the new Cybersecurity Act, the scope of entities subject to these regulations will expand. What does this regulation mean for internet service providers? PDF | Video |
14:30 | Updates on DNS Anycast for the National .CZ DomainTomáš Hála | CZ.NIC In 2024, the anycast infrastructure for the .CZ domain was significantly strengthened with the deployment of a 400GE link to NIX.CZ and new locations in the Czech Republic and abroad. How did the process unfold? What challenges are we facing? What is its capacity, and where is it heading in the future? Why did we start using catalog zones? And who else has begun utilizing the anycast network? PDF | Video | NÚKIB PortalTomáš Pekař | NÚKIB In connection with the new cybersecurity law, NÚKIB is developing its own portal, which will serve as the main contact point for cybersecurity. Reporting of regulated services, incident notifications, or information on current threats—these are just some of the features of the new portal. PDF | Video |
14:50 | Measuring the Performance of DNS Zone TransfersPetr Špaček | Internet Systems Consortium How can we measure the performance of DNS zone transfers? What are the differences between cases involving a single small zone, a large zone (TLD), or numerous small zones? What is the impact of data transfer security on performance? How does DNS-over-TLS scale? PDF | Video | New Cybersecurity LawPetr Kopřiva | NÚKIB The presentation summarizes the current state of the proposed cybersecurity law, which builds on the NIS2 directive and affects thousands of Czech companies and organizations. We will also highlight the most essential parts of the proposed legislation. PDF | Video |
15:10 | Coffeebreak | |
15:40 | A Quarter Million PrefixesMaria Matějka | CZ.NIC | BIRD The size of the IPv6 table is slowly approaching a quarter-million entries, and IPv4 is nearing the magical one-million mark. Can we improve hardware performance by aggregating prefixes with the same nexthops? PDF | Video | CTU Activities in 2025Marek Ebert | Český telekomunikační úřad How does the Chairman of the CTU Council evaluate the previous year (2024), and what activities does the national regulator plan for 2025? The presentation will focus on key tasks that the CTU has planned, both in the electronic communications market and within its new competencies as the digital coordinator under the DSA regulation. PDF | Video |
16:00 | Documenting the CESNET3 Network with NetBoxLadislav Loub | CESNET High-quality documentation is crucial for the efficient operation of large-scale networks today. This presentation will showcase the approach we chose for the new CESNET3 network. We will demonstrate how we use NetBox, how we enhanced it with custom extensions, and how it is becoming a "source of truth" for the gradual implementation of automation. PDF | Video | Panel Discussion: Vision for 2030Jan Kolouch | CESNET This panel discussion will take an unconventional look at the outlook for the Czech Republic in the digital domain by 2030, reflecting various perspectives from the participating panelists, including representatives from public administration (ČTÚ, NÚKIB) and the private sector. Video |
16:20 | How to Implement Central Log ManagementLukáš Macura | CESNET The lecture will describe how to set up central log management in a network. It will not focus on a specific solution but rather on the journey and the challenges that may arise along the way. Additionally, it will include practical advice on potential problems and what to avoid. PDF | Video | |
16:40 | Root cause analysis - benefits of having Flow data right beside SNMP, OTel, and other logsMatěj Pavelka | Flowcutter Presentation discusses why it is beneficial to have multiple datasources in one’s disposal when one is dealing with Root cause analysis. The main use case focuses on analysing Flow data right beside SNMP, OTel, and other logs in open-source Grafana stack. Presentation is product agnostic. PDF | Video | |
17:00 | Code of Conduct: Yesterday, Today, and TomorrowMaria Matějka The CSNOG website contains a paragraph about how participants should treat each other. This paragraph has been in place since CSNOG's inception, and now it is time to look back and assess whether we are satisfied with this setting. PDF | Video | |
17:10 | Is network engineering at a standstill?Tomáš Hlaváček PDF | Video | |
17:15 | Implementation of RFC 8950Marian Rychtecký | NIX.CZ Video | |
17:20 | End of Day 1 | |
18:00 | Baťa Principle | |
19:00 | Social Event |
Time | Network Management | Network Management |
---|---|---|
09:30 | Registration | |
10:00 | Updates and Plans for Network Monitoring with ipfixprobeKarel Hynek | CESNET The ipfixprobe tool, developed by CESNET, enables monitoring of network traffic on various devices—from home routers to high-performance servers monitoring 100GE links. The lecture will present the latest features, including DPDK support for commodity network cards and monitoring support for 400GE links. PDF | Video | Innovations in Practical Teaching – Virtual Labs at NetLAB FEL CTUMarcel Poláček, Jaroslav Burčík | Fakulta elektrotechnická ČVUT NetLAB represents a revolutionary approach to practical teaching and research in information technologies. Thanks to remote access, it provides students and researchers with easy access and space for learning, simulations, design, and testing of modern scenarios in networking, cybersecurity, and operating systems. PDF | Video |
10:20 | How we built sFlow visualization tool (open source)Blažej Krajňák | Energotel When it comes to parsing, storing and visualizing network telemetry data for hundred gigs networks, many open source tools stop to be sufficient. This presentation describes how we built lightweight but powerful internal tool using GoFlow2 - Clickhouse - Grafana stack. PDF | Video | |
10:40 | Rise of the Merchant SiliconPatrick Prangl | Arista Networks Merchant silicon got more popular over the last years as the capabilities and use-cases have increased significantly. This talk will show the evolution and differences of merchant silicon. PDF | Video | |
11:00 | Stepping out of the IDS Stereotype: Applying Suricata’s Full PotentialLukáš Šišmiš | CESNET Suricata is known for its role as an IDS/IPS, but its capabilities go much further. This session will explore how Suricata can be used for network troubleshooting, as a cybersecurity library, and even as a web application firewall in AWS, unlocking its full potential for various network operations. PDF | Video | |
11:20 | Coffeebreak | |
11:40 | SDN at L0 with Open HardwareMichal Hažlinský | CESNET Learn how SDN-based optical transmission system allows network operators to use the familiar, DevOps-focused control plane to operate a DWDM network and deliver an expanded service portfolio over the existing fiber footprint. PDF | Video | |
12:00 | Evolution to SRv6 – Theory and ApplicationVladimír Bureš | ALEF NULA The development of transport technology from MPLS LDP, RSVP TE, through Segment Routing MPLS and SR-TE to SRv6. Basic principles of operation, comparisons, advantages, limitations, and a configuration example. PDF | Video | |
12:20 | Automation of Data Center Configuration at ČRA - Ansible, Git, CI/CD, ARISTAVojtěch Setina, Radim Roška | ALTEPRO solutions This lecture introduces the automation of ČRA data center configurations using Ansible AVD, Git/GitLab, and CI/CD pipelines. We will demonstrate how scripts manage networks, migrate services, and edit configurations based on the Source of Truth model, deployed via the ARISTA CloudVision Portal. PDF | Video | |
12:40 | Timeseries Troubles: How (Not) to Calculate StatisticsMarian Rychtecký | NIX.CZ "Timeseries Troubles: How (Not) to Calculate Statistics" reveals the most common mistakes when working with timeseries databases. You will learn how to avoid errors in calculating operational statistics and receive tips for proper analysis of time series. PDF | Video | |
13:00 | Automated DNSSEC Management – Enhance the Security of the Czech Internet!Zdeněk Brůna | CZ.NIC The CZ.NIC Association has supported DNSSEC in the .CZ domain registry since 2008 and enabled its deployment via CDNSKEY records since 2017. Support for simplified management of higher DNS security is also available in KNOT DNS. PDF | Video | |
13:20 | Closing | |
13:30 | Lunch |
Photogallery
Meeting Report
The seventh meeting of the community of Czech and Slovak network administrators, CSNOG, took place on 21 and 22 January 2025. The CSNOG event is organized by CZ.NIC, NIX.CZ and CESNET. The program of this event is managed by the program committee.
Presentations and videos from this year's CSNOG are available on the event website under the program section.
CSNOG 2025 in numbers:
219 participants, mainly from the Czech and Slovakia
28 talks (divided into three tracks)
2 lightning talks
1 panel discussion
7 partners:
GOLD: Altepro, Unimus
SILVER: Alef Nula, Seznam, RIPE NCC, ICANN
COFFEE: Flexoptix
This summary was written by Petr Krčmář, who is a member of the Program Committee.
Summary
Kateřina Kubecová: BGP Protection with TCP-AO
TCP-AO is a method of securing BGP that does not require costly encryption of the entire traffic but allows verification of the authenticity of received data. Historically, the MD5 hashing algorithm has been used, but it is now considered outdated, and TCP-AO serves as a replacement. It introduces several improvements, such as supporting multiple keys for each connection.
Keys can be changed without interrupting the connection. The input is a key with an ID, and we also send the ID of the key that should be used for response generation. The other side verifies the signature, and if it does not match, the packet is ignored.
Information crucial for TCP-AO is stored in a structure called MKT (Master Key Tuple), which includes the TCP connection identifier, TCP options, SendID and RecvID, the key for password derivation, key derivation functions, and the MAC (Message Authentication Code) algorithm. When changing a key, the new key ID is sent using the old key, allowing the peer to learn it. To remove an old key that is still valid, it must be manually deleted.
In BIRD, the authentication setting can be configured as "ao," allowing multiple keys to be set with different preferences. In JunOS, there is an additional "start-time" setting, which allows a new valid key to be added at a specific time.
Jan Kučera: eBPF/XDP for Network Subsystem Performance Optimization
Our goal is to protect an endpoint server from DDoS attacks, such as an HTTP(S) service running on ports 80 and 443. The aim is to mitigate as much as possible using the available hardware. There are multiple ways to achieve this: deploying a protective device in front of the server or configuring the system properly.
We focus on traffic that is not easily identified by its source. We cannot simply determine whether a source IP address is legitimate. This makes IP blacklisting, rate limiting, and similar measures ineffective. A typical attack in this scenario is TCP SYN Flood, where an attacker sends connection initiation requests, consuming server resources. The standard solution involves SYN cookies, which encode session information into the response.
SYN cookies are implemented in the Linux kernel, but the default implementation is not highly efficient. However, there is a module for iptables called "rawcookie" that moves response processing to the raw table. This bypasses both conntrack and the routing subsystem, allowing direct response generation to the MAC address of the nearest router. The result is approximately twice the performance of the original implementation, handling millions of packets per second on standard hardware.
A further step involves pushing response generation even closer to the network card using eBPF/XDP, which allows injecting custom user code into specific parts of the Linux kernel. This code can capture packets, forward them to a user application, send them to standard processing, or generate a response directly. Our goal is to avoid allocating socket buffers and generate responses immediately.
The implementation, named "xdpcookie," inspects TCP headers and destination ports, generates SYN cookies, and sends them. It also supports validating the subsequent ACK, allowing conntrack verification before passing the packet through or validating the cookie. Additional features include VLAN support and the ability to enable/disable L3/L4 checksum calculations.
The original "rawcookie" handled 15.2 million packets per second (Mp/s) in tests, while "xdpcookie" doubled performance to 34.7 Mp/s. With checksum offloading to the network card, it exceeded 40 Mp/s. Performance can be further improved by deploying an additional machine in front of the protected server, but this requires handling sequence number translation. A Linux kernel patch exists to enforce custom sequence numbers.
Lukáš Vacek: XDP Deployment in Knot DNS
The performance of authoritative DNS servers can be increased by adding more nodes in different locations, increasing link capacities, and adding new servers. However, these measures are financially demanding. Instead, a more efficient DNS daemon can be used. XDP technology allows bypassing the entire network stack and delivering data directly to the daemon. It is recommended to use at least a Linux kernel version 5.x and a high-performance network card capable of handling multiple queues. This enables parallel packet processing across CPU cores. The Knot DNS developers have tested Nvidia ConnectX-6 Dx, Intel series 700, and Intel Series E810 network cards.
In a Linux environment, the service configuration must be adjusted, granting the necessary capabilities to Knot. Additionally, network card settings should be optimized, including queue count, memory allocation, interrupts, and CPU core assignment. Using only physical cores and disabling Hyper-Threading is preferable. These adjustments can be made using ethtool.
Knot’s configuration is straightforward: specify the listening IP addresses and choose the network interface for XDP deployment. A route-check option determines whether a packet should be processed via XDP or the standard network stack, ensuring no packet is lost.
One challenge with XDP is that traffic bypasses the network stack, meaning ACL rules are not applied, tcpdump provides limited visibility, and standard packet capture tools cannot collect traffic. Monitoring can be done by mirroring ports on an upstream switch. Knot also provides built-in statistics collection to track counter increments and verify functionality.
Tomáš Hála: DNS Anycast Developments for .CZ
CZ.NIC operates two critical systems: the domain registry and authoritative servers. These must remain operational at all times, as an outage would render the .CZ domain inaccessible, disrupting many services.
The system is scaled not only for regular traffic but also for handling critical situations. Last year, a new anycast location was added in Kyiv, connected at 10 Gbps, currently handling 600 queries per second. The server was procured in the Czech Republic and transported under complex bureaucratic conditions. It serves not only Ukrainian traffic but also queries from Poland, the Baltic countries, and other regions.
A data-driven analysis identified priority countries for optimization, reducing query response times. The United States, hosting many major companies, was one such priority, leading to the addition of a fifth U.S. anycast location in Los Angeles. With XDP technology, the high-capacity DNS stacks in the Czech Republic were streamlined. Previously requiring 40 servers, only 10 are now needed. A third site was also introduced, featuring the first 100GE production DNS servers. For robustness, different software and hardware are used across locations. However, XDP-based DNS operation is currently possible only in Knot DNS. The goal is to develop a second implementation, likely in NSD. Once stable, it will be deployed.
The average traffic across anycast is 24,000 queries per second, while the updated DC Tower ČRa location can handle 240 million queries per second, ensuring resilience against large-scale attacks.
Petr Špaček: Measuring Performance When Transferring DNS Zones
When measuring, it's possible to start with the smallest zone containing two records, but such a small transfer is very difficult to measure, and the result is very sensitive to various measurement errors. Therefore, it's possible to use a real TLD zone. The Czech zone is confidential, so we can use, for example, the Swiss zone. During measurements, developers experienced in one out of ten cases that the server started slower.
The system can utilize the IO pressure interface, which can tell what the process was waiting for. It turned out that this one process was waiting to load data from the disk. With such large data, you encounter disk cache size limitations, which need to be taken into account. Data transfer over a secure TLS channel is just as fast as transferring over an unsecured channel, and it only costs a little more CPU load. This surprised me a lot. On the other hand, securing with TSIG increases the transfer time by about ten percent. We usually don’t transfer such large zones, but typically have a large number of small zones. The average transfer time for a zone should not change with the number of zones, but in practice, it does, and times fluctuate a lot. Transfer times for 10,000 and 20,000 zones are stable, but with 30,000 zones, the times spike sharply. After the server starts, there’s a gradual decrease in CPU load down to zero.
Around 38 seconds after startup, a TCP socket error related to the TIME_WAIT state appears. Every TCP connection is defined by a combination: source IP, source port, destination IP, and destination port. When establishing a connection, it’s necessary to choose a source port number, and we had 28,000 configured. However, after the connection is closed, the port enters the TIME_WAIT state, after which it can be reused once the waiting period expires. The default waiting time was set to 60 seconds, which is a very long time. When the zone transfer fails due to this, it tries again in 30 seconds.
It’s possible to limit the number of connections in the waiting state so that ports never run out. This number should never be changed—it eats kittens. But if you know what you’re doing, you can solve this problem. When the value was adjusted to 1,000, the zone transfer became very fast, and everything behaved better and more stably. This proves we were right, and we’re encountering a limited number of TCP ports. It’s important not to trust anything while testing and to check the testing environment. Transferring many zones means many TCP connections and a lot of tuning. Forget about linear behavior.
Maria Matějka: A Quarter Million Prefixes
The number of IPv4 prefixes is growing, and we are approaching one million, which will no longer fit into hardware and will become a problem for many devices. We can try lossless aggregation of the routing table so that it doesn’t cause any operational limitations. The goal is to ensure that traffic still behaves the same way but takes up less expensive memory. If memory is still insufficient, we can select only the routes that are truly needed in hardware, while routing the rest via the processor. But how do we choose them? We can do it statically, based on configuration, or dynamically, based on traffic statistics. This is in a very early stage in BIRD. The idea is that 80% of traffic is handled by just 20% of routes.
For IPv6, the table contains about 217,000 prefixes, but it can be aggregated down to 70,000. Around 50,000 prefixes remain unchanged as received. Additionally, about 15% of the final result varies depending on location. Routes from America will tend to aggregate more in Europe, and vice versa. Similarly, 977,000 IPv4 prefixes can be aggregated down to about 220,000, with around 70,000 prefixes remaining unchanged. Roughly 30% of the prefixes will look completely different depending on the location.
BIRD can aggregate routes based on virtually anything, as long as prefix aggregation is included. However, cross-prefix aggregation is not yet possible. We still need to measure, analyze, and publish results properly. We will likely discover some bugs that need fixing. Hopefully, this will even result in at least one bachelor’s thesis.
Ladislav Loub: Documenting the CESNET3 Network Using Netbox
Netbox is a system for documenting and recording network infrastructure and data centers. Recently, it has started being used as a source of truth for closed-loop automation. In the past, most network documentation existed, but it was scattered across different systems. There are various alternatives, including RackTables, IP Fabric, and finally, Netbox.
Netbox offers a more complex structure than RackTables, has a well-developed API, and is extensible. However, it does not fully support direction-less technologies. We have accepted a compromise where DWDM technology is treated as a black box, and we document only individual interconnections. When deploying new network components, technicians install and configure devices, run scripts that prepare basic configurations, and then Netbox takes over. It now supports modular functionality and MxN port mapping, which is especially useful for patch panel management. A public device type library also speeds up the onboarding of new devices.
The network can now be fully modeled in Netbox, but we are not yet ready to fully automate its configuration. However, there is supporting software called Rundeck, which can import data from the network, generate unique Port IDs, and retrieve configurations from Netbox. The Netbox data model does not cover all necessary functionalities, so some tools need to be extended. For example, we had to migrate an old Inventory Monitor, so we created a model based on the old database. This allows us to track historical transceiver usage in devices. Another extension is Netbox Attachments, which allows attaching files to objects within Netbox. This plugin is publicly available on GitHub and can be used by anyone.
Synchronization with different systems like CRM, VMware, Graylog, Grafana, and IPAM is crucial. For example, Netbox links organizations with their IP prefixes, creating relationships like port – prefix – organization. Grafana can then generate reports based on assigned tags. To simplify network admin management, we needed a way to track SSH keys per device. As a result, we developed a new Netbox plugin for SSH key management. When someone leaves, we can easily find and revoke their keys.
Lukáš Macura: How to Manage Centralized Logging
Why do we even need to log? First, a lot of regulations require it, but most importantly, we should want to do it ourselves. Without logging, you're blind and have no insight into what's happening. Every system administrator is responsible for this – you have to know what’s going on in every system. There are no excuses. If an admin doesn’t want to log, then it’s the manager’s responsibility.
What should we log? Simply everything, so we always have all the necessary data. However, we must ensure that application logs do not contain sensitive information, such as user passwords. Most applications log correctly, but we need to be careful about custom applications and their output. Where should we log from? Primarily from production systems, devices containing production data, network devices, and any components that have access to the internet. Various tools can be used, such as Rsyslog, Syslog-ng, Nxlog for Windows, or BEATS. Standard syslog algorithms have the advantage that they can be easily searched with grep and archived. However, analyzing them is very difficult because they are just raw text with no structured format.
For more detailed analysis, BEATS is a better option, but it requires orchestration on clients. Deploying and configuring agents across hundreds of devices at once is a challenge. The benefit, however, is that parsing happens on the client side. For log analysis and correlation, Graylog can be deployed, but it requires significant system resources. Until you start logging, you won’t see what’s happening. And if you don’t log centrally, you can’t correlate events effectively. To fully utilize logs, automation should be applied on top of them, allowing you to react to important events.
Karel Hynek: Network Monitoring with ipfixprobe
The monitoring infrastructure Flox collects data aggregated into information about individual flows. Analytics on network traffic can then be performed on this data. The ipfixprobe tool is a modular flow exporter that can collect data from various sources such as PCAP, NDP, DPDK, or raw socket. NDP is used for acceleration FPGA cards, and raw socket allows data extraction from the Linux kernel.
The output of ipfixprobe is bidirectional flow, but it can be configured to export unidirectional flow. Classic information from NetFloxV5 is available, such as IP addresses, MAC addresses, ports, byte and packet counts. We started exporting data for machine learning as well, such as packet length and time sequences for the first thirty packets, and histograms of packet lengths and times. Documentation is available in the Git repository, and source codes are on GitHub. Packages for EPEL8 and EPEL9 are available, but the packages are also accessible in many different distributions. In practice, the tool is deployed on eight peering links of the CESNET network, which have a throughput of more than 100 Gbps. We use standard Dell servers with two processors and custom SmartNIC cards. Each server can monitor multiple links, with the probe's lossless throughput reaching 175 Gbps at worst. The probe is also deployed at the University of Dresden, in the backbone network of the Vysočina region, and testing is also ongoing in the Seznam.cz network.
In the past year, significant improvements have been made to the DPDK plugin, which can run as a secondary data extractor. Support for common SmartNIC cards has also been expanded; tested cards include Nvidia Mellanox ConnectX-6 and Broadcom N1400GD with a 400GE port. We cannot yet monitor full 400 gigabits due to PCI Express limitations. Thanks to buffer adjustments, we’ve achieved roughly a thousandfold improvement in packet loss at full saturation. The probe can also process packets intended for connection establishment using QUIC or HTTP/3. Initial packets are obfuscated using a known key, but we need to extract information from them, which slows us down. The current version can handle losslessly a 40Gbps peering with Google. Work is underway to support monitoring 400GE networks, but the challenge lies in transferring data from the card to the CPU. PCI Express can transfer up to 500 Gbps in its fifth generation, but that’s not enough for bidirectional transfer. The sixth generation can transfer up to 900 Gbps, but there are still very few servers available. Therefore, it is necessary to ensure the offload of flow records in NIC FPGA firmware. More than 70% of data is transferred in 10% of flows. The probe can then either receive everything, just trimmed headers, or only aggregated flow info. Everything is in the internal testing phase, and support should be available during the summer.
To simplify the creation and orchestration of the monitoring infrastructure, the web interface Panda is used, which guides you through everything. Infrastructures can be very complex, with several probes, collectors, and servers for analytics available, and Panda allows you to set everything up and manage it. Panda will be available under an open license during the summer.
Blažej Krajňák: sFlow Visualization Tool
The sFlow protocol collects packet headers and adds several metadata such as port or routing information. This data flows into GoFlow2 and then into a ClickHouse database, which stores column-oriented data. Usually, you don’t need to list all the information about every packet; you can choose specific columns. Additionally, the database is compressed, allowing you to store fifteen to twenty times more data.
ClickHouse can dynamically enrich data from other sources and can keep small tables in RAM to increase performance. The ALIAS data type is useful, as it can calculate a column using a specified logic, which isn’t actually stored in the database. For example, the DSCP value is calculated by shifting the ToS by two bits, and no additional data needs to be stored for that. When analyzing, it’s important to have real-time data access, which would take a long time if processing large datasets. We can create a reasonable sampling over the tables at the beginning, which selects just a sample of data and thus saves time during processing.
The output is then provided through a dashboard, where user-friendly filtering with an autocomplete function is available, and individual packets can be selected or an SQL query can be entered manually. The longer the time frame you choose for the output, the larger the sampling will be, but you can adjust it manually or turn it off completely. Visualization is then handled by Grafana.
Patrick Prangl: The Rise of Merchant Silicon
Merchant silicon refers to chips produced in bulk by wholesale suppliers and sold to network device manufacturers. These chips can be either FPGA or ASIC. The latter is more static, simpler, and has lower power consumption, which is enough for many deployments. They process packets so that the universal CPU doesn't have to. We offer 100GE and 400GE variants, have 800GE available, and are preparing for 1600GE.
The performance of these solutions is limited by the speed at which control chips are connected to switch chips. The speed has been gradually increasing from 10GE to 50GE, and up to 100GE, with possible further growth. Buffers are crucial for scenarios with many sources communicating with few targets or when the link speed needs to change along the route.
Merchant chips have been around since 2008, initially used for edge switching. Since about 2016, chips suitable for all parts of large networks have been available. Technological advancements have enabled the use of 3nm technology, improving chip performance and reducing power consumption. All chip manufacturers have access to the same technologies and production processes. The difference lies in the design for different deployments: from chips like Tofino to the more powerful Tomahawk, Trident, and Jericho, designed for high-performance routing. Tofino is now a dead project, though it is still available for purchase, and no further development is expected. Tomahawk 5 was originally designed for AI networks, with a throughput of up to 51.2 Tbps. Trident, used for campus networks, offers larger buffers and more features. The latest generation consumes 40% less power per bit and comes in a smaller package. Jericho focuses on large buffer capacities, ideal for service providers that need complex features.
Lukáš Šišmiš: Overcoming the IDS Stereotype
Suricata generates security alerts related to network traffic. Typically, a line is diverted from a router or switch to Suricata, which handles the preparation of security data. It can be deployed as an IDS or IPS, where it uses rules to drop certain traffic. The option of using Suricata simply for exporting traffic metadata in JSON format is often overlooked.
Suricata can also serve as a flow probe and, with the right hardware, can handle up to 400 Gbps. However, it should be noted that data is only exported after the flow ends. The upcoming Suricata 8 will support continuous export, and is expected to be released in the fall. Data export can be elevated, using Suricata as a Network Security Monitor to export records for individual application protocols such as DNS, HTTP, TLS, SMTP, and others. This allows obtaining detailed information on individual files moving across the network, which can be useful for audits. Suricata can also be used to detect misconfigured devices in the network. While switches can't view the content of specific communications, Suricata can. With basic rules installed, it can monitor events like corrupted HTTP traffic, DNS errors, asymmetric routing, TTL expiry, and more. These rules should be enabled when deploying Suricata, as they can generate a lot of data, and security events could get lost among them.
Suricata can also function as a firewall, which Amazon uses in its cloud services. Unlike traditional firewalls, Suricata defaults to allowing traffic and only blocks specific traffic at higher layers based on rules. This can be marketed as a next-generation firewall. With Suricata 8, a library will be available that allows integration into custom applications. This means your applications won't have to parse traffic and send it to Suricata manually; the library will make the process more efficient.
Michal Hažlinský: SDN on L0 with Open Hardware
In optical networks, communication requires several devices, especially in DWDM networks. These elements are called the zero network layer, typically including multiplexers/demultiplexers and optical amplifiers. These devices process the optical signal without converting it to electrical form. CESNET develops and operates its own optical devices called CzechLight.
The optical network can be controlled openly using SDN (Software Defined Network). At its core is ROADM (Reconfigurable Optical Add-Drop Multiplexer), which allows switching individual optical channels and terminating them locally. It’s similar to VLAN in regular switches but involves optical signals. The goal of configuration is to set all devices along the path so the signal reaches the destination from the input device through the infrastructure. In the rack, 1U boxes are placed, offering up to eight ports, along with optical amplifiers. These can be configured via API. Standard protocols like YANG, Netconf, and others are used for configuration. By using these common protocols, we can leverage modern tools, with DevOps practices like using Ansible and storing configurations in Git. The operation of switching a channel involves adding and removing port configurations in individual devices.
The openness of the optical layer can also support other services, not just data transmission. For example, CESNET works with the distribution of precise time, and time sources are available at certain points. For high accuracy, the same fiber must be used for both directions to avoid errors from asymmetry. Amplifiers need to be added to the network to filter, amplify, and return specific signals. This system can be managed using the software architecture.
Vladimír Bureš: Evolution to SRv6
Before 2000, the MPLS protocol was created, adding labels to traffic. Information about these labels is spread using the LDP protocol, which works with the local routing protocol. It seemed like a good idea because any routing protocol could be used. Between the second and third layer headers, a label stack can be stored, containing a list of labels for routing.
This solution was good at the time but showed its drawbacks over time. Notably, we need to maintain an additional protocol running parallel to the regular routing protocol. Synchronization between LDP and IGP must be addressed, and when issues occur, customer outages can be longer. As network elements converge at different times, loops can form. This can take a few tens of milliseconds, but today, we want recovery to be guaranteed within fifty milliseconds. The next step is segment routing, which aims to simplify the process and eliminate protocols with redundant functions. It’s also more scalable, as elements don’t need to store labels. The network doesn’t hold any state information and uses source routing, meaning the network behaves independently for each packet without a pre-established path.
For segment routing, two different data planes can be used: MPLS or native IPv6. In IPv6, the segment is an IPv6 address, providing 128 bits, and the segment list is stored in the extended header. In MPLS, labels are placed at the start of the packet, consumed, and removed along the path. In SRv6, the items are not physically removed from the header, but an indicator for the current label is stored. The header includes the destination address and a next-header item that specifies what follows. It could be TCP, UDP, another IPv4 packet, or an Ethernet frame. Essentially, anything can be encapsulated. Routing is then done through standard IPv6 routing. Not all nodes need to understand SRv6 since it uses regular routing.
Radim Roška, Vojtěch Šetina: Automation of Data Center Configuration at ČRA
For network automation, the key is the SOT (Source of Truth), which contains information about how network elements should behave. The network can be virtualized using tools like CINTAINERlab, which allows simulation and testing before deployment. This forms the first phase, which can gradually replace the original manual solution.
For automation, AVD templates from Arista can be used, containing pre-configured settings for common situations. Configuration is defined using variables; adjusting two lines changes many settings across devices. The output is fed into the CloudVision configuration tool, which applies the changes. In the second phase, additional Python tools were added to simplify management. For example, the Editor script allows users to prepare templates and version them in Git. When creating a new service, the Editor initiates a change branch, pushes it to the main branch, validates it automatically, and then compiles the data model to generate new configurations. The final step is deploying everything to the elements.
This is essentially the same DevOps approach that’s now a natural part of software development, gradually extending to network element configuration. The result is greater efficiency, reduced error risk, and the ability to configure the network without detailed knowledge. Full change history and the ability to revert to a previous state are also available.
Marian Rychtecký: How (Not) to Count Statistics
A traditional tool for graphing time series data is MRTG, which has been used for decades. Its drawback is limited flexibility, fixed time windows, and dependency on SNMP. However, it's simple and outputs static images. Due to averaging data over fixed intervals, short-term spikes can sometimes disappear from the graph.
NIX.CZ wanted to create a new portal with improved graphing. After testing several solutions, developers decided on their own system. Network data is stored in a database, enriched with additional data from Netbox, and then combined and stored in InfluxDB via an API. Data is gathered every 30 seconds and stored in different forms for different windows. Three values are saved: minimum, maximum, and average.
For more accurate data, sampling was moved to 20 seconds, and then every minute, the three values (maximum, minimum, and average) are averaged. New graphs better reflect reality and are smoother. Data was also enriched with state data and time intervals related to port availability, which can be done directly in InfluxDB. Data is computed in a cascade, where longer intervals are recalculated from higher-level data sets. For yearly graph summarization, it's better to use monthly data rather than raw input.
Zdeněk Brůna: Automatic DNSSEC Management
DNSSEC is a security extension to DNS that uses asymmetric cryptography for signing records. It requires support from the domain registry, technical administrators, and connection providers. .CZ registry has supported DNSSEC since 2008, though resolver support wasn’t widespread. Therefore, the ODVR open resolver service was launched, and it now runs on the Knot Resolver.
The support rate for the .CZ domain grew to around 60%, but has stagnated for some time. The right registrar usually makes DNSSEC deployment seamless. There are large differences among registrars, but the biggest ones usually have no issues. For resolvers, support is high but fluctuates between 70% and 90%. You can check if anything is wrong in your network.
For a small number of domains, DNSSEC can be implemented manually by generating a key, signing records locally, and then publishing DS records via a registrar. Alternatively, the registrar can sign the zone for you. The third option involves generating a CDNSKEY record, with the registrar taking care of the rest. All this can be automated and is supported by most authoritative servers.
Automatic DNSSEC management was introduced in RFC 7344 and 8078, allowing implementation without coordination with the domain holder, which is often complex. This has been supported in FRED since 2017, and Knot DNS has supported it since version 2.5.0. The registry scans domains for CDNSKEY records and initiates DNSSEC based on them.
This automatic management is considered the highest level of DNSSEC implementation, supported by only seven ccTLDs. Along with .cz, these include .sk, .ch, .li, .se, .nu, and .cr. Costa Rica uses our FRED system and could easily implement support. Initially, scan results were sent to technical administrators via automated emails, which was cumbersome. Now, scan results appear in WHOIS, where the entire process can be tracked.
In the future, the system could be improved by scanning from multiple locations, continuously evaluating scans throughout the day, and refining WHOIS display. Currently aimed at technicians, the goal is to make it more user-friendly. Consideration is being given to implementing RFC 9615, where the CDNSKEY record can be inserted into another already signed DNS operator zone.