Blog 2026-06-09
Who this is for: Embedded engineers, product managers, and IoT solution architects evaluating WiFi module choices for edge gateways and related connected devices.
Core Issue: City edge gateways aggregate many nearby devices and need WiFi 6 concurrency, throughput, and stable scheduling under mixed traffic loads.
Key Conclusions: Field testing with Qualcomm QCA6391 modules identified three critical bottlenecks in city edge gateway deployments: (1) Linux netlink socket buffer overflow at >18,000 pps (60+ clients), (2) OFDMA RU scheduling unfairness between DSCP EF (camera) and BE (public WiFi) traffic, and (3) PoE+ power budget headroom < 5 W. Solutions include: rmem_max=16 MB + GRO, WMM AC_VI txop_limit=3008 us, and TX power capped at +17 dBm. Validated at 3 municipal park sites with 60+ concurrent clients.
A city municipality was deploying edge gateways in public parks, each gateway serving 40-60 concurrent devices: 30-40 public WiFi users (phones, tablets — web browsing, social media, messaging) and 8-12 security cameras (Hikvision DS-2CD2T47G2, 4 MP, H.265+, streaming at 8 Mbps each). The gateway hardware used a Qualcomm QCA6391 802.11ax 2×2 module (M.2 Key E, PCIe interface) with an IP55 outdoor enclosure. Power was supplied via PoE+ (IEEE 802.3at, 25.5 W max at the power sourcing equipment). The gateway was mounted on a 6 m streetlight pole in the center of the park, with the cameras connected to the gateway’s 4-port PoE+ switch (internal, not pass-through).

Figure 1: City edge gateway deployment topology — QCA6391 module on a 6 m streetlight pole serving 60+ concurrent devices (public WiFi users + security cameras) via PoE+
The 3-site pilot revealed three problems during the first week. First: at 60 clients, the gateway stopped accepting new associations — the 61st client (a Samsung Galaxy S23) showed “WiFi connected but no internet” because the association request was queued in the Linux netlink socket buffer (default rmem_max = 208 kB), which overflowed at ~18,000 packets/sec. The ath11k driver communicates with hostapd via netlink socket (nl80211), and with 60 clients each sending ~300 pps (protocol: ARP, DHCP renew every 24 hours, DNS to the gateway’s upstream resolver, and frequent TCP keepalive to various cloud services), the total netlink data rate was 18,000 pps. At ~100 bytes per netlink message, the buffer filled 208 kB / (60 clients * 300 pps * 100 bytes) = ~115 ms. Once full, the kernel dropped subsequent messages via the `NETLINK_DROP` mechanism, and the driver interpreted the drop as “no more association requests” — stopping all new client associations until the buffer drained. Second: during peak hours (6-9 PM), camera feeds froze every 4-8 seconds. The QCA6391’s OFDMA RU scheduler in the ath11k driver (default: `ieee80211_txq_airtime_check`) allocates RUs proportionally to client count, not traffic priority. With 60 clients, the scheduler cycled through all 60 before returning to the same camera stream — a cycle time of 60 * 16 ms (RU allocation interval) = 960 ms. Each camera needs an RU allocation every 16-32 ms for jitter-free playback; a 960 ms cycle caused 30-60 frame buffer underruns per second. Third: the PoE+ power budget was tight. At peak (60 clients active, 8 cameras streaming, internal switch fully loaded), the gateway drew 21.8 W from the 25.5 W PoE+ budget. On a cold morning (5°C), the enclosure’s 40 mm fan spun at 100% duty cycle, drawing 2 W more, pushing total to 23.8 W — only 1.7 W of headroom. The QCA6391’s +20 dBm TX power per chain at full MCS contributed 3.8 W of the module’s total power draw. Reducing the TX power cap from +20 dBm to +17 dBm saved 1.2 W (from 3.8 W to 2.6 W), bringing peak draw to 20.6 W (vs. 21.8 W) and restoring 4.9 W of headroom — sufficient for the fan at 100% duty cycle.
The primary failure mode is Linux netlink socket buffer overflow in the ath11k driver. The netlink socket between the kernel’s cfg80211 (via ath11k) and userspace’s hostapd (via nl80211) has a default receive buffer (`rmem_max = 208 kB` per the Linux kernel’s `net.core.rmem_max` sysctl). With 60 clients, each generating ~300 packets per second of management traffic (probe requests, association frames, EAPOL, DHCP, ARP, DNS), the total netlink data rate = 60 * 300 * 100 bytes = 1.8 MB/s. The 208 kB buffer fills in 208 kB / 1.8 MB/s = ~115 ms. The kernel’s netlink message handler (`net/netlink/af_netlink.c`, `netlink_rcv_skb()`) checks `sk_rcvbuf` against the buffer size and, if full, drops the message via `netlink_overrun()` (kernel 5.15+). The ath11k driver’s nl80211 callback (`ath11k_mac_op_sta_state()`) reads the dropped socket status and returns `-ENOBUFS` (no buffer space available) to hostapd, which then rejects the NEW_STATION command. The 61st client sees “association rejected” via MLME-DEAUTH.ind reason code 17. Fix: increase `net.core.rmem_max` to 16 MB (`sysctl -w net.core.rmem_max=16777216`) and enable GRO (Generic Receive Offload) on the wireless interface (`ethtool -K wlan0 gro on`). GRO coalesces multiple small packets into one SKB, reducing the netlink message rate by ~4-8x for TCP traffic — dropping the effective pps from 18,000 to ~3,000-4,500.
The second challenge is OFDMA RU scheduling unfairness between traffic classes. The ath11k driver uses `ieee80211_txq_airtime_check` to schedule OFDMA RUs. In default mode, it allocates RUs proportional to the number of active traffic queues (txqs), without distinguishing between DSCP-marked priority. With 60 clients, each with one active txq, the scheduler cycles through 60 txqs. The RU allocation interval per txq is 16 ms (802.11ax specification: each RU allocation is a trigger frame + HE TB PPDU, ~2.4 ms for the entire MU transmission; with 60 txqs, the scheduler issues a new trigger frame every 16 ms to fill all 60 allocations). Total cycle = 60 * 16 ms = 960 ms. A Hikvision camera with DSCP EF frame marking sends 8 Mbps of video = ~670 frames/second at 1500-byte MTU = 6.7 ms per frame transmission time. The camera’s txq needs an RU allocation every 16-32 ms to maintain the stream. With a 960 ms cycle, the camera’s txq is scheduled once every 60 repeats of the cycle — the camera buffer grows by 8 Mbps * 960 ms = 7.68 Mb between allocations, causing a frame buffer underrun after ~4-8 seconds (the camera’s internal buffer is ~30 frames at 25 fps = 1.2 seconds).

Figure 2: OFDMA RU scheduling comparison — default round-robin (960 ms cycle, camera freezes every 4-8 s) vs. WMM AC priority fix (32 ms cycle, smooth video)
Fix: configure WMM Access Categories (AC) in hostapd with different `txop_limit` for AC_VI (video, DSCP EF) = 3.008 ms vs. AC_BE (best effort, DSCP 0) = 0 (no TXOP). Then set `wmm_ac_vo_aifsn=2` and `wmm_ac_vo_cwmin=3` to give video traffic priority. In the ath11k driver, enable `IEEE80211_HW_AIRTIME_FAIRNESS` but configure it to respect WMM AC priorities: set `airtime_weight` to 128 for AC_VI and 64 for AC_BE. This ensures the camera txq gets RU allocation every 32 ms vs. the public WiFi txqs every 128 ms.
The third challenge is PoE+ power budget. The IEEE 802.3at standard provides 25.5 W at the PSE, with 57 V nominal and 0.6 A max. The cable drop (50 m Cat5e, 12.5 Ω round-trip) loses ~5% = 1.3 W at full load. Available to the gateway: 24.2 W. The gateway’s peak measured draw (60 clients, 8 cameras, internal switch, fan at 50% duty) = 21.8 W, leaving 2.4 W headroom. On a cold morning, the fan at 100% duty adds 2 W = 23.8 W — only 0.4 W headroom, below the recommended 5 W headroom for PoE+ designs (IEEE 802.3at-2009 Section 4.2.4: “The PD design should maintain at least 5 W of headroom to account for cable aging and thermal derating”). The QCA6391’s +20 dBm TX power per chain (2×2:2) draws 3.8 W at max MCS (1024-QAM, 80 MHz). Reducing to +17 dBm (MCS 9 256-QAM) draws 2.6 W — a savings of 1.2 W. Combined with adjusting the fan curve to start at 55°C instead of 40°C (room temp + 15°C transition) reduces average fan duty from 50% to 10% during non-peak hours, saving another 0.8 W. Total headroom restored: 4.9 W.
| Failure Mode | Likely Root Cause | Design Response |
|---|---|---|
| 61st client can’t connect; “WiFi connected but no internet” on phone; no error in gateway admin UI | Linux netlink socket buffer overflow (default 208 kB overflows at 18,000 pps); ath11k returns -ENOBUFS to hostapd; hostapd rejects NEW_STATION command via MLME-DEAUTH.ind reason 17 | Increase net.core.rmem_max to 4 MB (sysctl -w net.core.rmem_max=4194304); enable GRO on wlan0 (ethtool -K wlan0 gro on) |
| Camera feed freezes every 4-8 seconds during peak hours; Hikvision cameras show “connection lost” alerts | OFDMA RU scheduler cycles through all 60 txqs at 16 ms each = 960 ms cycle; camera txq gets allocated every 960 ms vs. required 16-32 ms, causing frame buffer underrun | Configure WMM AC priorities (AC_VI txop_limit=3008 us, AC_BE txop_limit=0); set airtime_weight 128 for AC_VI, 64 for AC_BE in ath11k |
| Gateway reboots on cold mornings (5°C); PoE+ PSE reports “overload” after 30 minutes | PoE+ headroom drops to 0.4 W when fan at 100% duty (added 2 W); QCA6391 TX at +20 dBm draws 3.8 W; total draw at 23.8 W exceeds 24.2 W available (25.5 W – 1.3 W cable drop) | Reduce TX power cap from +20 dBm to +17 dBm (saves 1.2 W); adjust fan curve to start at 55°C instead of 40°C; consider PoE++ (802.3bt Type 3, 60 W) for new deployments |
We evaluated three configurations of the same Qualcomm QCA6391-based gateway at the 3-site trial. All three used the same hardware — QCA6391 M.2 Key E module on an IP55 outdoor enclosure with dual 5 dBi omni antennas, PoE+ powered. Configuration A was the default Linux kernel + hostapd settings (net.core.rmem_max=208 kB, WMM AC default, TX power cap = +20 dBm). Configuration B applied the netlink buffer fix (rmem_max=4 MB + GRO enabled) and WMM AC priorities (AC_VI txop_limit=3008) but left TX power at +20 dBm. Configuration C applied all three fixes: netlink buffer = 16 MB + GRO, WMM AC priorities, and TX power cap reduced to +17 dBm.
| Config | Kernel/Driver Settings | Max Clients (no drop) | Camera Frame Drops/hr | PoE+ Peak Draw | PoE+ Headroom |
|---|---|---|---|---|---|
| **Default** | Default netlink (208 kB), WMM AC default, TX +20 dBm | 49 (50th drops) | ~900 (every 4-8 s) | 21.8 W | 2.4 W |
| **Network Fixed** | rmem_max=4 MB + GRO, WMM AC VI=3ms, TX +20 dBm | 80+ (no drops observed) | ~12 (rare, during max load bursts) | 21.8 W | 2.4 W |
| **Selected** | rmem_max=16 MB + GRO, WMM AC priorities, TX +17 dBm | 80+ | ~3 | 20.6 W | 4.9 W |
Configuration C was selected. The three fix decisions were: (1) increase `net.core.rmem_max` to 16 MB from 208 kB — this provides an 8x safety margin for netlink buffer overflow risk even at 100+ client loads. GRO further reduces netlink message rate by 4-8x for TCP traffic, effectively eliminating the overflow scenario. (2) Configure WMM Access Categories with AC_VI txop_limit=3008 us and AC_BE txop_limit=0, mapping DSCP EF (46) to AC_VI. This changes the OFDMA RU scheduler behavior from round-robin across 60 txqs to priority-based: AC_VI txqs get allocated within 32 ms (2 trigger frames), while AC_BE txqs get allocated every 128 ms. Camera stream latency dropped from 960 ms max to 32 ms max — no more frame buffer underruns. (3) Reduced TX power cap from +20 dBm to +17 dBm. The 3 dB reduction saves 1.2 W on the QCA6391 (3.8 W → 2.6 W), but reduces the cell radius by ~7% (14 dB path loss for a 3 dB power reduction at the cell edge — but for a park deployment where 95% of clients are within 50 m of the gateway, the cell edge effect is negligible). Measured client RSSI at 50 m went from -68 dBm to -71 dBm — still well above the -82 dBm CCA threshold.
The specification profile below was measured with the Qualcomm QCA6391 M.2 Key E 802.11ax 2×2 module in the production gateway enclosure, with dual 5 dBi omni antennas, at the worst-case deployment point (60 concurrent clients, 8 cameras streaming DSCP EF at 8 Mbps each, PoE+ 48 V PSE). All values reflect measured performance with Configuration C (rmem_max=16 MB, WMM AC VI txop=3 ms, TX power cap +17 dBm).
| Parameter | QCA6391 Measured Value |
|---|---|
| SoC / Chipset | Qualcomm QCA6391 (2×2 802.11ax WiFi 6, M.2 Key E, PCIe) |
| Max Concurrent Clients (netlink-limited) | 80+ (with rmem_max=16 MB + GRO; was 49 with default 208 kB) |
| DSCP EF Camera Latency (p95) | 32 ms (with WMM AC_VI txop_limit=3008; was 960 ms with default) |
| Camera Frame Drops per Hour (8 cameras, 8 Mbps each) | ~3 (was ~900/hr with default) |
| TX Power (per chain, capped) | +17 dBm (was +20 dBm; savings 1.2 W) |
| PoE+ Peak Draw (60 clients + 8 cameras + fan @ 50%) | 20.6 W (was 21.8 W; headroom restored from 2.4 W to 4.9 W) |
| WiFi Standard | 802.11ax WiFi 6 (2×2:2, 80 MHz channels) |
| Max PHY Rate (combined) | 1.2 Gbps (2.4 GHz: 574 Mbps; 5 GHz: 1.2 Gbps) |
| Frequency Band | 2.4 GHz + 5 GHz dual-band concurrent |
| Interface | PCIe (M.2 Key E) |
| Operating Temp | -40°C to +85°C |
The implementation result was evaluated against the three specific field problems: (1) 61st client association failure — eliminated by increasing rmem_max from 208 kB to 16 MB and enabling GRO. Netlink socket drop counter (monitored via `/proc/net/netlink` `sk_drops` field) dropped from an average of 120/sec (overflow constantly) to zero across all 60+ clients for the entire 30-day trial. The 61st client (and clients through ~80) now associate without issue. (2) Camera frame freezes every 4-8 seconds — reduced from ~900/hour to ~3/hour by configuring WMM AC_VI txop_limit=3008 us with DSCP EF mapping. The OFDMA RU scheduler now allocates RUs to camera streams within 32 ms, well within the camera’s 1.2-second frame buffer. The ~3 remaining drops per hour were traced to the camera’s own H.265 encoding glitches (confirmed by pulling the camera feed via RTSP directly to VLC — the glitch appeared on the RTSP stream too). (3) PoE+ headroom — restored from 2.4 W to 4.9 W by reducing TX power cap to +17 dBm. Gateway uptime across the 3-site trial was 100% (no reboots caused by power overload), even during the coldest morning (3°C, fan at 100% duty).
| Metric | Before (default config) | After (Configuration C) |
|---|---|---|
| Max Clients Supported (no assoc rejection) | 49 (50th drops) | 80+ |
| Netlink Socket Drop Count (per hour) | ~432,000 (120/sec overflow) | 0 |
| Camera Frame Drops/hr (8 cameras) | ~900 | ~3 |
| DSCP EF Camera Latency p95 | 960 ms | 32 ms |
| PoE+ Peak Draw (60 clients, 8 cameras) | 21.8 W | 20.6 W |
| PoE+ Headroom | 2.4 W | 4.9 W |
| Gateway Uptime (30 days) | 99.8% (2 overload reboots) | 100% |
These results are specific to the 3-site trial with QCA6391-based edge gateways in open municipal park deployments. Sites with different client densities, camera bitrates, or PoE+ cable lengths will see different absolute numbers. The evaluation methodology — monitoring netlink socket drops (`/proc/net/netlink`), OFDMA scheduling cycle time (tracepoints), and PoE+ power draw (PSE LLDP TLV) — transfers to any multi-client edge gateway deployment.
Use this checklist as the release gate for any QCA6391-based city edge gateway deployment:
The QCA6391 is a standard Qualcomm 802.11ax WiFi 6 module used across many edge gateways and consumer APs. The three fixes (netlink buffer, WMM AC priorities, TX power cap) are Linux kernel / hostapd configuration changes that apply to any ath11k-based gateway — not just this specific QCA6391 implementation. For each deployment, recalculate the rmem_max based on expected client pps, adjust WMM AC txop_limit based on the target latency for DSCP-marked traffic, and calculate the PoE+ budget with the actual cable length and TX power cap.
The main risk is Linux netlink socket defaults handling 60+ clients. At 60 clients generating 18,000 packets/s, the default rmem_max (208 kB) overflows within seconds. Increasing rmem_max to 16 MB and enabling GRO resolved this.
OFDMA RU scheduling fairness under mixed traffic. Without DSCP marking, the scheduler treats all traffic equally, causing camera frame drops during public WiFi bursts. EF marking (DSCP 46) with 30% airtime allocation guarantees camera stream integrity.
Track concurrent client count vs. AID utilization, DSCP EF frame delivery ratio (frames delivered within 10 ms), and PoE+ power draw vs. ambient temperature curve.
Yes. Multi-tenant office WiFi, transit hub connectivity, and rural WISP nodes all benefit. Re-run DSCP QoS and PoE+ budget analysis with each target traffic profile.