Discussion:
LAN slow or dead, intermittently
(too old to reply)
Janos Dohanics
2016-06-24 15:26:59 UTC
Permalink
Hello List,

Please help me figure out what makes my LAN intermittently slow or just
about dead.

The LAN consists of a pfSense router (m1n1wall), a Netgear GS724T
switch, a recently installed FreeBSD 10.3 machine, several Windows 7 Pro
machines, androids and iPhones, and a Brother printer, altogether
between a dozen and 2 dozen networked devices.

There are no local servers on the network, so as far as I can tell,
most traffic to and from the local nodes is with the internet

Desktops have wired connections (100 MB or 1 GB NICs), but the phones
and most laptops are connected by WiFi.

WiFi is provided by a Linksys E1500 configured to work only as a WiFi
AP.

There is also a Linksys RE4000W WiFi extender on the network.

The FreeBSD machine, the printer, the switch, the E1500 and RE4000W
WiFis have static IP addresses. Most of the Windows machines have
reserved DHCP addresses, the rest are unreserved DHCP. pfSense is
providing the DHCP server.

I started to investigate the problem using mtr(8) which runs every 10
minutes. Several times in my testing, the average RTT between the
FreeBSD machine (10.10.11.252) and the router's LAN interface
(10.10.11.1) was hundreds of milliseconds. Also, several times, 1 out
of the 10 packets is lost, but whenever this packet loss occurs, RTTs
are mostly 0.1 or 0.2 ms, but always less than 1 ms.

Pinging various hosts on the LAN at times is in the 10s of milliseconds
or higher.

Using my FreeBSD laptop and the FreeBSD machine, I tested the LAN with
netperf(1) which showed over 80 Mbit/s in good times but also less than
1 Mbit/s at other times.

During off-hours, I have disconnected and then reconnected computers
one by one, but could not identify any as the culprit. Replaced the
switch and patch cables - the problem is still there... intermittently.

None of the Windows computers seems to have any malware which might
flood the network. I looked at pftop, and traffic seems to be legit -
but how could I see all LAN traffic and possibly correlate it with the
slowdown? Could this be caused by a broken networking hardware? How
would I identify that?

What is the intelligent way to track down this problem? Please advise.
--
Janos Dohanics
Ernie Luzar
2016-06-27 12:50:56 UTC
Permalink
Post by Janos Dohanics
Hello List,
Please help me figure out what makes my LAN intermittently slow or just
about dead.
The LAN consists of a pfSense router (m1n1wall), a Netgear GS724T
switch, a recently installed FreeBSD 10.3 machine, several Windows 7 Pro
machines, androids and iPhones, and a Brother printer, altogether
between a dozen and 2 dozen networked devices.
There are no local servers on the network, so as far as I can tell,
most traffic to and from the local nodes is with the internet
Desktops have wired connections (100 MB or 1 GB NICs), but the phones
and most laptops are connected by WiFi.
WiFi is provided by a Linksys E1500 configured to work only as a WiFi
AP.
There is also a Linksys RE4000W WiFi extender on the network.
The FreeBSD machine, the printer, the switch, the E1500 and RE4000W
WiFis have static IP addresses. Most of the Windows machines have
reserved DHCP addresses, the rest are unreserved DHCP. pfSense is
providing the DHCP server.
I started to investigate the problem using mtr(8) which runs every 10
minutes. Several times in my testing, the average RTT between the
FreeBSD machine (10.10.11.252) and the router's LAN interface
(10.10.11.1) was hundreds of milliseconds. Also, several times, 1 out
of the 10 packets is lost, but whenever this packet loss occurs, RTTs
are mostly 0.1 or 0.2 ms, but always less than 1 ms.
Pinging various hosts on the LAN at times is in the 10s of milliseconds
or higher.
Using my FreeBSD laptop and the FreeBSD machine, I tested the LAN with
netperf(1) which showed over 80 Mbit/s in good times but also less than
1 Mbit/s at other times.
During off-hours, I have disconnected and then reconnected computers
one by one, but could not identify any as the culprit. Replaced the
switch and patch cables - the problem is still there... intermittently.
None of the Windows computers seems to have any malware which might
flood the network. I looked at pftop, and traffic seems to be legit -
but how could I see all LAN traffic and possibly correlate it with the
slowdown? Could this be caused by a broken networking hardware? How
would I identify that?
What is the intelligent way to track down this problem? Please advise.
I also had performance problems with 10.3 that did not happen with 10.2
and older releases. When the lan went dead I had to reboot the host
system to get things working again because users were on my back. I
never let this condition exist to see if it would resolve it self.

My first solution was to go back to using 10.2 and everything was fine.
One evening I swapped the hosts 10.2 hard drive with the 10.3 hard
drive so I could test some more. Just by luck I checked the date & time
by issuing the "date" command. The date was correct but the time was -2
hours off. I manually set the correct time using the "date" command and
let 10.3 run as production. With in 5 days the lan network was having
performance problems again. I checked the host time and it was off by
-30 minutes. I replaced the host motherboard battery with a new one and
manually set the correct time again. Things ran ok for about 2 weeks
when it happened again. This time the time was off by -2 minutes.

This time I enabled the base ntpd time daemon by adding this to rc.conf
ntpd_enable="YES"
ntpd_sync_on_start="YES"

Since then 10.3 has been running ok [2 months now]. I think some thing
in the network stack code changed between 10.2 and 10.3 that made the
time sync between lan nodes and the host, time range dependent.

I would say that checking the time on your host and all the machines on
the lan would be a good place to start looking for your problem.

Good luck
Janos Dohanics
2016-06-28 13:31:56 UTC
Permalink
On Mon, 27 Jun 2016 08:50:56 -0400
Post by Ernie Luzar
Post by Janos Dohanics
Hello List,
Please help me figure out what makes my LAN intermittently slow or
just about dead.
[...]
I also had performance problems with 10.3 that did not happen with
10.2 and older releases. When the lan went dead I had to reboot the
host system to get things working again because users were on my
back. I never let this condition exist to see if it would resolve it
self.
My first solution was to go back to using 10.2 and everything was
fine. One evening I swapped the hosts 10.2 hard drive with the 10.3
hard drive so I could test some more. Just by luck I checked the date
& time by issuing the "date" command. The date was correct but the
time was -2 hours off. I manually set the correct time using the
"date" command and let 10.3 run as production. With in 5 days the lan
network was having performance problems again. I checked the host
time and it was off by -30 minutes. I replaced the host motherboard
battery with a new one and manually set the correct time again.
Things ran ok for about 2 weeks when it happened again. This time the
time was off by -2 minutes.
This time I enabled the base ntpd time daemon by adding this to
rc.conf ntpd_enable="YES"
ntpd_sync_on_start="YES"
Since then 10.3 has been running ok [2 months now]. I think some
thing in the network stack code changed between 10.2 and 10.3 that
made the time sync between lan nodes and the host, time range
dependent.
I would say that checking the time on your host and all the machines
on the lan would be a good place to start looking for your problem.
Good luck
Well, date(1) shows a time which seems reasonably correct... it didn't
occur to me that an inaccurate clock could also be the cause of the
kind of problem I described. Thanks anyway...
--
Janos Dohanics
Ernie Luzar
2016-06-28 16:24:54 UTC
Permalink
Post by Janos Dohanics
On Mon, 27 Jun 2016 08:50:56 -0400
Post by Ernie Luzar
Post by Janos Dohanics
Hello List,
Please help me figure out what makes my LAN intermittently slow or
just about dead.
[...]
I also had performance problems with 10.3 that did not happen with
10.2 and older releases. When the lan went dead I had to reboot the
host system to get things working again because users were on my
back. I never let this condition exist to see if it would resolve it
self.
My first solution was to go back to using 10.2 and everything was
fine. One evening I swapped the hosts 10.2 hard drive with the 10.3
hard drive so I could test some more. Just by luck I checked the date
& time by issuing the "date" command. The date was correct but the
time was -2 hours off. I manually set the correct time using the
"date" command and let 10.3 run as production. With in 5 days the lan
network was having performance problems again. I checked the host
time and it was off by -30 minutes. I replaced the host motherboard
battery with a new one and manually set the correct time again.
Things ran ok for about 2 weeks when it happened again. This time the
time was off by -2 minutes.
This time I enabled the base ntpd time daemon by adding this to
rc.conf ntpd_enable="YES"
ntpd_sync_on_start="YES"
Since then 10.3 has been running ok [2 months now]. I think some
thing in the network stack code changed between 10.2 and 10.3 that
made the time sync between lan nodes and the host, time range
dependent.
I would say that checking the time on your host and all the machines
on the lan would be a good place to start looking for your problem.
Good luck
Well, date(1) shows a time which seems reasonably correct... it didn't
occur to me that an inaccurate clock could also be the cause of the
kind of problem I described. Thanks anyway...
When I posted the above reply I also included your email address
Janos Dohanics <***@3dresearch.com>

It got bounced and some spam harvesting dns message came back. With the
reply you just posted shows me your reading the questions list and
posting BS just to drive traffic to your email address harvesting web
site. To all who read this thread, beware, ignore all posts from this
domain 3dresearch.com

Loading...