RTL8211F芯片4芯网线对接千兆协商成千兆问题
问题背景:
目前接到客户反馈,在OK3588平台使用百兆网线连接千兆设备无法正确切换成百兆,它还是协商成千兆,造成网络无法联通。且使用ethtool工具强制配置成百兆之后可以联通。
但是使用百兆网线连接百兆设备,可以正常协商成百兆。
解决思路:
接到问题之后,首先是网上寻找有没有类似的问题,然后就看到了这个文章:
参考文章https://blog.csdn.net/Emo_snaf/article/details/120762203
根据这个文章,得到的修改思路是:修改读取Link Partner的能力函数genphy_read_lpa()
,增加判断GBCR (1000Base-T Control Register, Address 0x09)的bit 9,也就是MII_CTRL1000
寄存器的ADVERTISE_1000FULL
位,检查自己是否支持Advertise 1000Base-T Full-Duplex能力,如果不支持,则去修改读到的对方Link Partner 1000Base-T Full Duplex的能力为不支持千兆。这样自然程序后面也就会把速度协商为百兆了。
源码位置
./include/uapi/linux/mii.h
协商能力寄存器: MII_CTRL1000 0x09
对方协商能力寄存器: MII_STAT1000 0x0a
1000Base-T Full-Duple能力位: ADVERTISE_1000FULL 0x0200 第9位
Link partner 1000BASE-T Full-Duple能力位: LPA_1000FULL 0x0800 第11位
初步解决方法:
diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index e83428c92b33..d4eb8f2253d3 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -2307,6 +2307,9 @@ int genphy_read_lpa(struct phy_device *phydev)
if (lpagb < 0)
return lpagb;
+ if (!(phy_read(phydev, MII_CTRL1000) & ADVERTISE_1000FULL))
+ lpagb = lpagb & ~LPA_1000FULL;
+
if (lpagb & LPA_1000MSFAIL) {
int adv = phy_read(phydev, MII_CTRL1000);
这样使用ethtool去查看网口信息时,打印结果如下:
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: Symmetric Receive-only
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
Link partner advertised pause frame use: Symmetric Receive-only
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 100Mb/s
Duplex: Full
Auto-negotiation: on
master-slave cfg: preferred slave
master-slave status: slave
Port: Twisted Pair
PHYAD: 1
Transceiver: external
MDI-X: Unknown
Supports Wake-on: ug
Wake-on: d
Current message level: 0x0000003f (63)
drv probe link timer ifdown ifup
Link detected: yes
可以看到,确实Link Partner advertised link modes
里面就没有1000baseT/Full
了,而且它的Speed: 100Mb/s
也已经正确了。
如果就问题来说,这样便可以解决了,可以认为这个是一个内核PHY驱动的bug,但是总感觉还是有深入研究以下的必要。
进一步排查:
后续测试中发现,在使用使用百兆网线连接千兆设备时,OK3568的Linux5.10系统也是存在这个问题,识别成千兆速度,网络无法联通,但是在Linux4.19系统中就没有这个问题了。
用ethtool去查看网口信息时可以发现,Linux5.10系统是这样的:
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: Symmetric Receive-only
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Link partner advertised pause frame use: Symmetric Receive-only
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 1000Mb/s
Duplex: Full
Auto-negotiation: on
master-slave cfg: preferred slave
master-slave status: slave
Port: Twisted Pair
PHYAD: 1
Transceiver: external
MDI-X: Unknown
Supports Wake-on: ug
Wake-on: d
Current message level: 0x0000003f (63)
drv probe link timer ifdown ifup
Link detected: yes
Linux4.19系统是这样的:
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Link partner advertised pause frame use: Symmetric Receive-only
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 100Mb/s
Duplex: Full
Port: MII
PHYAD: 0
Transceiver: external
Auto-negotiation: on
Supports Wake-on: ug
Wake-on: d
Current message level: 0x0000003f (63)
drv probe link timer ifdown ifup
Link detected: yes
可以明显发现的一点是,Linux4.19的系统中的Link partner advertised link modes:
中,也是存在1000baseT/Full
,也就是说,它认为对端设备是有千兆的能力。而且Advertised link modes:
中,也是存在1000baseT/Full
的,它也认为自己有千兆的能力。但是它却没有使用千兆的速度。
这就说明肯定是Linux4.19系统自己肯定对协商速度进行了修改。
根据上面我们解决问题时候的方法,猜测Linux4.19应该也是使用了REG09.BIT9这一位进行了处理。所以我们主要线索就是去找PHY驱动的代码中哪里使用了MII_CTRL1000寄存器和ADVERTISE_1000FULL这一位。
后续发现在./drivers/net/phy/phy_device.c
中的genphy_read_status()
函数,有以下内容:
/**
* genphy_read_status - check the link status and update current link state
* @phydev: target phy_device struct
*
* Description: Check the link, then figure out the current state
* by comparing what we advertise with what the link partner
* advertises. Start by checking the gigabit possibilities,
* then move on to 10/100.
*/
int genphy_read_status(struct phy_device *phydev)
{
int adv;
int err;
int lpa;
int lpagb = 0;
int common_adv;
int common_adv_gb = 0;
/* Update the link, but return if there was an error */
err = genphy_update_link(phydev);
if (err)
return err;
phydev->lp_advertising = 0;
if (AUTONEG_ENABLE == phydev->autoneg) {
if (phydev->supported & (SUPPORTED_1000baseT_Half
| SUPPORTED_1000baseT_Full)) {
lpagb = phy_read(phydev, MII_STAT1000);
if (lpagb < 0)
return lpagb;
adv = phy_read(phydev, MII_CTRL1000);
if (adv < 0)
return adv;
if (lpagb & LPA_1000MSFAIL) {
if (adv & CTL1000_ENABLE_MASTER)
phydev_err(phydev, "Master/Slave resolution failed, maybe conflicting manual settings?\n");
else
phydev_err(phydev, "Master/Slave resolution failed\n");
return -ENOLINK;
}
phydev->lp_advertising =
mii_stat1000_to_ethtool_lpa_t(lpagb);
common_adv_gb = lpagb & adv << 2;
}
lpa = phy_read(phydev, MII_LPA);
if (lpa < 0)
return lpa;
phydev->lp_advertising |= mii_lpa_to_ethtool_lpa_t(lpa);
adv = phy_read(phydev, MII_ADVERTISE);
if (adv < 0)
return adv;
common_adv = lpa & adv;
phydev->speed = SPEED_10;
phydev->duplex = DUPLEX_HALF;
phydev->pause = 0;
phydev->asym_pause = 0;
if (common_adv_gb & (LPA_1000FULL | LPA_1000HALF)) {
phydev->speed = SPEED_1000;
if (common_adv_gb & LPA_1000FULL)
phydev->duplex = DUPLEX_FULL;
} else if (common_adv & (LPA_100FULL | LPA_100HALF)) {
phydev->speed = SPEED_100;
if (common_adv & LPA_100FULL)
phydev->duplex = DUPLEX_FULL;
} else
if (common_adv & LPA_10FULL)
phydev->duplex = DUPLEX_FULL;
if (phydev->duplex == DUPLEX_FULL) {
phydev->pause = lpa & LPA_PAUSE_CAP ? 1 : 0;
phydev->asym_pause = lpa & LPA_PAUSE_ASYM ? 1 : 0;
}
} else {
int bmcr = phy_read(phydev, MII_BMCR);
if (bmcr < 0)
return bmcr;
if (bmcr & BMCR_FULLDPLX)
phydev->duplex = DUPLEX_FULL;
else
phydev->duplex = DUPLEX_HALF;
if (bmcr & BMCR_SPEED1000)
phydev->speed = SPEED_1000;
else if (bmcr & BMCR_SPEED100)
phydev->speed = SPEED_100;
else
phydev->speed = SPEED_10;
phydev->pause = 0;
phydev->asym_pause = 0;
}
return 0;
}
EXPORT_SYMBOL(genphy_read_status);
可以发现,上述代码中:
第29行:读取MII_STAT1000
寄存器,获取了对方PHY设备的千兆协商能力,储存在了lpagb
第32行:读取MII_CTRL1000
寄存器,获取了自己的千兆协商能力,储存在了adv
第47行:将两个协商能力相与,判断共同双方具有的千兆的能力,储存在了common_adv_gb
。
第50行:读取MII_LPA
寄存器,获取对方了对方PHY设备的十兆以及百兆的协商能力,储存在了lpa
第56行:读取MII_ADVERTISE
寄存器,获取了自己的十兆以及百兆的协商能力,储存在了adv
第60行:将两个协商能力相与,判断共同双方具有的十兆以及百兆的能力,储存在了common_adv
。
第67-79行:根据common_adv_gb
以及common_adv
,最终确定使用的phydev->speed
。
通过梳理这个函数代码,我们明白了Linux4.19系统的实现过程。但是Linux5.10的函数却没有了这部分。
Linux5.10的genphy_read_status()
函数如下
/**
* genphy_read_status - check the link status and update current link state
* @phydev: target phy_device struct
*
* Description: Check the link, then figure out the current state
* by comparing what we advertise with what the link partner
* advertises. Start by checking the gigabit possibilities,
* then move on to 10/100.
*/
int genphy_read_status(struct phy_device *phydev)
{
int err, old_link = phydev->link;
/* Update the link, but return if there was an error */
err = genphy_update_link(phydev);
if (err)
return err;
/* why bother the PHY if nothing can have changed */
if (phydev->autoneg == AUTONEG_ENABLE && old_link && phydev->link)
return 0;
phydev->speed = SPEED_UNKNOWN;
phydev->duplex = DUPLEX_UNKNOWN;
phydev->pause = 0;
phydev->asym_pause = 0;
err = genphy_read_master_slave(phydev);
if (err < 0)
return err;
err = genphy_read_lpa(phydev);
if (err < 0)
return err;
if (phydev->autoneg == AUTONEG_ENABLE && phydev->autoneg_complete) {
phy_resolve_aneg_linkmode(phydev);
} else if (phydev->autoneg == AUTONEG_DISABLE) {
err = genphy_read_status_fixed(phydev);
if (err < 0)
return err;
}
return 0;
}
EXPORT_SYMBOL(genphy_read_status);
可以看到这个函数精炼了很多,它把之前Linux4.19上所做的操作都打包成了函数处理。
如增加了一个genphy_read_master_slave()
函数,而把之前Linux4.19上的大部分操作都放到了genphy_read_lpa()
函数里面:
int genphy_read_lpa(struct phy_device *phydev)
{
int lpa, lpagb;
if (phydev->autoneg == AUTONEG_ENABLE) {
if (!phydev->autoneg_complete) {
mii_stat1000_mod_linkmode_lpa_t(phydev->lp_advertising,
0);
mii_lpa_mod_linkmode_lpa_t(phydev->lp_advertising, 0);
return 0;
}
if (phydev->is_gigabit_capable) {
lpagb = phy_read(phydev, MII_STAT1000);
if (lpagb < 0)
return lpagb;
if (lpagb & LPA_1000MSFAIL) {
int adv = phy_read(phydev, MII_CTRL1000);
if (adv < 0)
return adv;
if (adv & CTL1000_ENABLE_MASTER)
phydev_err(phydev, "Master/Slave resolution failed, maybe conflicting manual settings?\n");
else
phydev_err(phydev, "Master/Slave resolution failed\n");
return -ENOLINK;
}
mii_stat1000_mod_linkmode_lpa_t(phydev->lp_advertising,
lpagb);
}
lpa = phy_read(phydev, MII_LPA);
if (lpa < 0)
return lpa;
mii_lpa_mod_linkmode_lpa_t(phydev->lp_advertising, lpa);
} else {
linkmode_zero(phydev->lp_advertising);
}
return 0;
}
EXPORT_SYMBOL(genphy_read_lpa);
在这个函数中:
第14行:读取MII_STAT1000
寄存器,获取了对方PHY设备的千兆协商能力,储存在了lpagb
第19行:读取MII_LPA
寄存器,获取对方了对方PHY设备的十兆以及百兆的协商能力,储存在了lpa
然后将lpagb
储存在了phydev->lp_advertising
,将lpa
储存在了phydev->lp_advertising
。
但是没有对自己的千兆协商能力进行什么处理,也仅仅是用来判断了一下Master/Slave解析的情况。
在后面的phy_resolve_aneg_linkmode()
函数中,位置为drivers/net/phy/phy-core.c
:
/**
* phy_resolve_aneg_linkmode - resolve the advertisements into PHY settings
* @phydev: The phy_device struct
*
* Resolve our and the link partner advertisements into their corresponding
* speed and duplex. If full duplex was negotiated, extract the pause mode
* from the link partner mask.
*/
void phy_resolve_aneg_linkmode(struct phy_device *phydev)
{
__ETHTOOL_DECLARE_LINK_MODE_MASK(common);
int i;
linkmode_and(common, phydev->lp_advertising, phydev->advertising);
for (i = 0; i < ARRAY_SIZE(settings); i++)
if (test_bit(settings[i].bit, common)) {
phydev->speed = settings[i].speed;
phydev->duplex = settings[i].duplex;
break;
}
phy_resolve_aneg_pause(phydev);
}
EXPORT_SYMBOL_GPL(phy_resolve_aneg_linkmode);
可以看到程序会根据phydev->lp_advertising
与phydev->advertising
的值,计算出两个PHY之间支持到的最大的速度与双工半双工的能力。
在上面的genphy_read_lpa()
函数中,我们已经看到了phydev->lp_advertising
的由来,那phydev->advertising
的值是从哪里来的呢?
经过不断的追代码,最终我们找到了在phy_probe()
函数中:
/**
* phy_probe - probe and init a PHY device
* @dev: device to probe and init
*
* Description: Take care of setting up the phy_device structure,
* set the state to READY (the driver's init function should
* set it to STARTING if needed).
*/
static int phy_probe(struct device *dev)
{
struct phy_device *phydev = to_phy_device(dev);
struct device_driver *drv = phydev->mdio.dev.driver;
struct phy_driver *phydrv = to_phy_driver(drv);
int err = 0;
phydev->drv = phydrv;
/* Disable the interrupt if the PHY doesn't support it
* but the interrupt is still a valid one
*/
if (!phy_drv_supports_irq(phydrv) && phy_interrupt_is_valid(phydev))
phydev->irq = PHY_POLL;
if (phydrv->flags & PHY_IS_INTERNAL)
phydev->is_internal = true;
mutex_lock(&phydev->lock);
/* Deassert the reset signal */
phy_device_reset(phydev, 0);
if (phydev->drv->probe) {
err = phydev->drv->probe(phydev);
if (err)
goto out;
}
/* Start out supporting everything. Eventually,
* a controller will attach, and may modify one
* or both of these values
*/
if (phydrv->features) {
linkmode_copy(phydev->supported, phydrv->features);
} else if (phydrv->get_features) {
err = phydrv->get_features(phydev);
} else if (phydev->is_c45) {
err = genphy_c45_pma_read_abilities(phydev);
} else {
err = genphy_read_abilities(phydev);
}
if (err)
goto out;
if (!linkmode_test_bit(ETHTOOL_LINK_MODE_Autoneg_BIT,
phydev->supported))
phydev->autoneg = 0;
if (linkmode_test_bit(ETHTOOL_LINK_MODE_1000baseT_Half_BIT,
phydev->supported))
phydev->is_gigabit_capable = 1;
if (linkmode_test_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT,
phydev->supported))
phydev->is_gigabit_capable = 1;
of_set_phy_supported(phydev);
phy_advertise_supported(phydev);
/* Get the EEE modes we want to prohibit. We will ask
* the PHY stop advertising these mode later on
*/
of_set_phy_eee_broken(phydev);
/* The Pause Frame bits indicate that the PHY can support passing
* pause frames. During autonegotiation, the PHYs will determine if
* they should allow pause frames to pass. The MAC driver should then
* use that result to determine whether to enable flow control via
* pause frames.
*
* Normally, PHY drivers should not set the Pause bits, and instead
* allow phylib to do that. However, there may be some situations
* (e.g. hardware erratum) where the driver wants to set only one
* of these bits.
*/
if (!test_bit(ETHTOOL_LINK_MODE_Pause_BIT, phydev->supported) &&
!test_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT, phydev->supported)) {
linkmode_set_bit(ETHTOOL_LINK_MODE_Pause_BIT,
phydev->supported);
linkmode_set_bit(ETHTOOL_LINK_MODE_Asym_Pause_BIT,
phydev->supported);
}
/* Set the state to READY by default */
phydev->state = PHY_READY;
out:
/* Assert the reset signal */
if (err)
phy_device_reset(phydev, 1);
mutex_unlock(&phydev->lock);
return err;
}
第41行:genphy_read_abilities()
函数中,获取了phy的可支持的能力,genphy_read_abilities()
函数函数具体实现如下:
/**
* genphy_read_abilities - read PHY abilities from Clause 22 registers
* @phydev: target phy_device struct
*
* Description: Reads the PHY's abilities and populates
* phydev->supported accordingly.
*
* Returns: 0 on success, < 0 on failure
*/
int genphy_read_abilities(struct phy_device *phydev)
{
int val;
linkmode_set_bit_array(phy_basic_ports_array,
ARRAY_SIZE(phy_basic_ports_array),
phydev->supported);
val = phy_read(phydev, MII_BMSR);
if (val < 0)
return val;
linkmode_mod_bit(ETHTOOL_LINK_MODE_Autoneg_BIT, phydev->supported,
val & BMSR_ANEGCAPABLE);
linkmode_mod_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT, phydev->supported,
val & BMSR_100FULL);
linkmode_mod_bit(ETHTOOL_LINK_MODE_100baseT_Half_BIT, phydev->supported,
val & BMSR_100HALF);
linkmode_mod_bit(ETHTOOL_LINK_MODE_10baseT_Full_BIT, phydev->supported,
val & BMSR_10FULL);
linkmode_mod_bit(ETHTOOL_LINK_MODE_10baseT_Half_BIT, phydev->supported,
val & BMSR_10HALF);
if (val & BMSR_ESTATEN) {
val = phy_read(phydev, MII_ESTATUS);
if (val < 0)
return val;
linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT,
phydev->supported, val & ESTATUS_1000_TFULL);
linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseT_Half_BIT,
phydev->supported, val & ESTATUS_1000_THALF);
linkmode_mod_bit(ETHTOOL_LINK_MODE_1000baseX_Full_BIT,
phydev->supported, val & ESTATUS_1000_XFULL);
}
return 0;
}
EXPORT_SYMBOL(genphy_read_abilities);
可以看到它首先是读取了PHY芯片的MII_BMSR
寄存器,获取了PHY芯片支持的百兆能力,并根据能力将phydev->supported
进行置位。
然后在第25行使用if (val & BMSR_ESTATEN)
判断MII_BMSR
寄存器的第BMSR_ESTATEN
位,也就是PHY寄存器的BMSR (Basic Mode Status Register, Address 0x01)的bit8,查看PHY是否具有千兆的能力。
如果有的话,就在第26行读取MII_ESTATUS
寄存器,获取具体的支持千兆的能力,并根据能力将phydev->supported
进行置位。
执行完genphy_read_abilities()
函数的代码后,我们跳回到phy_probe()
函数中
在phy_probe()
函数的第59行,phy_advertise_supported()
函数中:
/**
* phy_remove_link_mode - Remove a supported link mode
* @phydev: phy_device structure to remove link mode from
* @link_mode: Link mode to be removed
*
* Description: Some MACs don't support all link modes which the PHY
* does. e.g. a 1G MAC often does not support 1000Half. Add a helper
* to remove a link mode.
*/
void phy_advertise_supported(struct phy_device *phydev)
{
__ETHTOOL_DECLARE_LINK_MODE_MASK(new);
linkmode_copy(new, phydev->supported);
phy_copy_pause_bits(new, phydev->advertising);
linkmode_copy(phydev->advertising, new);
}
EXPORT_SYMBOL(phy_advertise_supported);
可以看到这里将phydev->supported
支持的模式直接linkmode_copy
给了phydev->advertising
,也就是说,phydev->advertising
直接声明了PHY所支持的所有的能力。
那最后再接着说回来phy_resolve_aneg_linkmode()
函数,也就知道为什么Linux5.10内核最终得到的结果是支持千兆全双工的速度了。
后续我们在OK3576上测试发现,Linux6.1的系统上已经修复了这个问题。在Linux6.1的系统上使用ethtool命令查看网口的信息如下:
Settings for eth0:
Supported ports: [ TP MII ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: Symmetric Receive-only
Advertised auto-negotiation: Yes
Advertised FEC modes: Not reported
Link partner advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Link partner advertised pause frame use: Symmetric Receive-only
Link partner advertised auto-negotiation: Yes
Link partner advertised FEC modes: Not reported
Speed: 100Mb/s
Duplex: Full
Auto-negotiation: on
master-slave cfg: preferred slave
master-slave status: slave
Port: Twisted Pair
PHYAD: 1
Transceiver: external
MDI-X: Unknown
Supports Wake-on: ug
Wake-on: d
Current message level: 0x0000003f (63)
drv probe link timer ifdown ifup
Link detected: yes
可以看到打印信息与Linux4.19上查看的信息基本一致,也是它也认为自己和对方都有千兆的能力,但是它却没有使用千兆的速度。
我一开始也以为Linux6.1应该是发现了Linux5.10上存在的问题进行了修复,但是当我使用同样的排查方法去寻找Linux6.1上所做的更改的时候,发现根本找不到类似Linux4.19的逻辑。
而且我对比了Linux5.10与Linux4.19的PHY驱动文件,发现几乎一样,没有什么太大的变动。
但是我们注意到了Linux6.1系统上,网口连接的时候内核有以下打印信息:
[ 14.611725] RTL8211F Gigabit Ethernet stmmac-0:01: Downshift occurred from negotiated speed 1Gbps to actual speed 100Mbps, check cabling!
[ 14.613638] rk_gmac-dwmac 2a220000.ethernet eth0: Link is Up - 100Mbps/Full - flow control rx/tx
[ 14.613669] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
于是顺着这个线索来找打印信息,发现是在drivers/net/phy/phy-core.c
文件的phy_check_downshift()
函数中:
/**
* phy_check_downshift - check whether downshift occurred
* @phydev: The phy_device struct
*
* Check whether a downshift to a lower speed occurred. If this should be the
* case warn the user.
* Prerequisite for detecting downshift is that PHY driver implements the
* read_status callback and sets phydev->speed to the actual link speed.
*/
void phy_check_downshift(struct phy_device *phydev)
{
__ETHTOOL_DECLARE_LINK_MODE_MASK(common);
int i, speed = SPEED_UNKNOWN;
phydev->downshifted_rate = 0;
if (phydev->autoneg == AUTONEG_DISABLE ||
phydev->speed == SPEED_UNKNOWN)
return;
linkmode_and(common, phydev->lp_advertising, phydev->advertising);
for (i = 0; i < ARRAY_SIZE(settings); i++)
if (test_bit(settings[i].bit, common)) {
speed = settings[i].speed;
break;
}
if (speed == SPEED_UNKNOWN || phydev->speed >= speed)
return;
phydev_warn(phydev, "Downshift occurred from negotiated speed %s to actual speed %s, check cabling!\n",
phy_speed_to_str(speed), phy_speed_to_str(phydev->speed));
phydev->downshifted_rate = 1;
}
EXPORT_SYMBOL_GPL(phy_check_downshift);
可以看到这个函数会根据phydev->lp_advertising
所记录的对端PHY的能力和phydev->advertising
所记录的自己的PHY支持的能力,计算出共同的支持的能力,然后遍历settings
里面支持的速率,然后计算出应设的最大的速度,储存在speed
变量中。
然后在下面判断实际的phydev->speed
的值与应设的speed
的大小,如果phydev->speed
小于speed
,则使用更小的phydev->speed
,并打印出报错:Downshift occurred from negotiated speed 1Gbps to actual speed 100Mbps, check cabling!
那么接下来的问题就是,为什么同样的Linux5.10内核也有这个函数,但是却没有触发这个动作呢?也就是说,为什么我们在之前的代码中分析了,phydev->speed
的值在genphy_read_status()
函数中,也是通过phydev->lp_advertising
和phydev->advertising
计算出了千兆的能力,但是在这个phy_check_downshift()
函数中,phydev->speed
却是个百兆呢?
我们继续找到调用phy_check_downshift()
函数的位置:
/**
* phy_check_link_status - check link status and set state accordingly
* @phydev: the phy_device struct
*
* Description: Check for link and whether autoneg was triggered / is running
* and set state accordingly
*/
static int phy_check_link_status(struct phy_device *phydev)
{
int err;
WARN_ON(!mutex_is_locked(&phydev->lock));
/* Keep previous state if loopback is enabled because some PHYs
* report that Link is Down when loopback is enabled.
*/
if (phydev->loopback_enabled)
return 0;
err = phy_read_status(phydev);
if (err)
return err;
if (phydev->link && phydev->state != PHY_RUNNING) {
phy_check_downshift(phydev);
phydev->state = PHY_RUNNING;
phy_link_up(phydev);
} else if (!phydev->link && phydev->state != PHY_NOLINK) {
phydev->state = PHY_NOLINK;
phy_link_down(phydev);
}
return 0;
}
我们看到,该函数在第20行调用了phy_read_status()
函数:
static inline int phy_read_status(struct phy_device *phydev)
{
if (!phydev->drv)
return -EIO;
if (phydev->drv->read_status)
{
return phydev->drv->read_status(phydev);
}
else
{
return genphy_read_status(phydev);
}
}
而phy_read_status()
函数会去执行genphy_read_status()
函数,我们在genphy_read_status()
函数的末尾加打印信息打印phydev->speed
的值,发现还是1000,那为什么到了第25行phy_check_downshift()
函数的时候就变成100了呢?
这个时候我们在phy_read_status()
函数中增加打印可以发现,原来在phy_read_status()
函数中会执行phydev->drv->read_status()
函数,也就是第一条分支,那为什么我们之前发现最终会调用到genphy_read_status()
函数呢?
我们直接在genphy_read_status()
函数中增加dump_stack();
直接在内核中打印出调用的逻辑:
[ 3675.821941] Call trace:
[ 3675.821952] dump_backtrace+0xdc/0x130
[ 3675.821971] show_stack+0x1c/0x30
[ 3675.821992] dump_stack_lvl+0x64/0x7c
[ 3675.822009] dump_stack+0x14/0x2c
[ 3675.822028] genphy_read_status+0x24/0x174
[ 3675.822047] rtlgen_read_status+0x1c/0x4c
[ 3675.822069] phy_read_status+0x60/0x8c
[ 3675.822086] phy_check_link_status+0x84/0x150
[ 3675.822104] phy_state_machine+0x26c/0x280
[ 3675.822125] process_one_work+0x1e8/0x454
[ 3675.822145] worker_thread+0x174/0x52c
[ 3675.822166] kthread+0xdc/0xe0
[ 3675.822185] ret_from_fork+0x10/0x20
结果发现原来如此呀,phy_read_status()
函数调用了rtlgen_read_status()
函数,然后rtlgen_read_status()
函数调用了genphy_read_status()
,那我们直接去找rtlgen_read_status()
函数在哪里就行了。
原来是在drivers/net/phy/realtek.c
文件里面:
static int rtlgen_read_status(struct phy_device *phydev)
{
int ret;
ret = genphy_read_status(phydev);
if (ret < 0)
return ret;
return rtlgen_get_speed(phydev);
}
看起来rtlgen_read_status()
函数调用了genphy_read_status()
之后,最后又返回了 rtlgen_get_speed()
函数,那继续来看 rtlgen_get_speed()
函数:
/* get actual speed to cover the downshift case */
static int rtlgen_get_speed(struct phy_device *phydev)
{
int val;
if (!phydev->link)
return 0;
val = phy_read_paged(phydev, 0xa43, 0x12);
if (val < 0)
return val;
switch (val & RTLGEN_SPEED_MASK) {
case 0x0000:
phydev->speed = SPEED_10;
break;
case 0x0010:
phydev->speed = SPEED_100;
break;
case 0x0020:
phydev->speed = SPEED_1000;
break;
case 0x0200:
phydev->speed = SPEED_10000;
break;
case 0x0210:
phydev->speed = SPEED_2500;
break;
case 0x0220:
phydev->speed = SPEED_5000;
break;
default:
break;
}
return 0;
}
这一下就完全的真相大白了!原来是在这里重新配置了phydev->speed
!所有的一切就都说得通了!
不过最后还有一个疑问,为什么Linux6.1可以,Linux5.10就有问题呢?
我在Linux5.10内核的phy_read_status()
函数中增加打印可以发现,原来在phy_read_status()
函数中不会执行phydev->drv->read_status()
函数,而是选择了直接去执行了genphy_read_status()
。
通过在genphy_read_status()
函数中增加dump_stack();
直接在内核中打印出调用的逻辑也可以看到这一点:
[ 9437.356211] Call trace:
[ 9437.356219] dump_backtrace+0x0/0x1b0
[ 9437.356226] show_stack+0x20/0x2c
[ 9437.356233] dump_stack_lvl+0xc8/0xf8
[ 9437.356240] dump_stack+0x18/0x34
[ 9437.356247] genphy_read_status+0x28/0x1d4
[ 9437.356261] phy_read_status+0x64/0x94
[ 9437.356267] phy_check_link_status+0xb4/0x15c
[ 9437.356274] phy_state_machine+0x190/0x264
[ 9437.356281] process_one_work+0x1e0/0x298
[ 9437.356288] worker_thread+0x1e0/0x278
[ 9437.356295] kthread+0xf4/0x104
[ 9437.356302] ret_from_fork+0x10/0x30
那为什么genphy_read_status()
没有找到drivers/net/phy/realtek.c
文件里面的rtlgen_read_status()
函数呢?经过查看代码,原来是Linux5.10少了一行:
diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
index e3c77485f9ae..cb64474ce8b2 100644
--- a/drivers/net/phy/realtek.c
+++ b/drivers/net/phy/realtek.c
@@ -693,6 +693,7 @@ static struct phy_driver realtek_drvs[] = {
.config_init = &rtl8211f_config_init,
.ack_interrupt = &rtl8211f_ack_interrupt,
.config_intr = &rtl8211f_config_intr,
+ .read_status = rtlgen_read_status,
.suspend = genphy_suspend,
.resume = rtl821x_resume,
.read_page = rtl821x_read_page,
加上这句之后,一切就全部都好用了!
至此,对PHY驱动的部分分析也就结束了。
最终写一下代码的调用逻辑关系吧:
phy_state_machine()
phy_check_link_status()
phy_read_status()
rtlgen_read_status()//realak phy驱动实现读取PHY状态
genphy_read_status()//调用通用驱动读取PHY状态
genphy_update_link()//更新 link 状态
genphy_read_master_slave() //如果是千兆网口,更新本端和对端的 master/slave
genphy_read_lpa()//更新对端(link partner) 声明的能力
phy_resolve_aneg_linkmode() //自协商模式,解析 link 结果
rtlgen_get_speed()//realak phy驱动设置实际协商速度
phy_check_downshift()//检测是否需要降速
一些更详细的PHY驱动的内容可以参考以下文章: