Discussion:
[nsd-users] understanding memory issues
Klaus Darilion
2018-07-02 13:01:45 UTC
Permalink
Hi!

We use NSD 4.1.6 as slave for a large zone (2G zone file). It seems
sometimes the memory is too short:

03:44:19 nsd[16836]: xfrd: zone xx committed "received update to serial
2018063010 at 2018-06-30T03:44:19 from 2a02:850:9::5 TSIG verified with
key rcode0-distribution"
03:45:30 nsd[16837]: rehash of zone xx. with parameters 1 0 5
6a0bf229ad0c7a2a
03:45:39 nsd[16837]: nsec3 xx 1 %
03:46:29 nsd[16837]: zone xx. received update to serial 2018063010 at
2018-06-30T03:44:19 from 2a02:850:9::5 TSIG verified with key rcode0-xxx
of 1717205965 bytes in 324.657 seconds
03:46:29 nsd[16837]: fork failed: Cannot allocate memory
03:46:32 nsd[16836]: process 16837 exited with status 256
03:46:32 nsd[7188]: handle_reload_cmd: reload closed cmd channel
03:46:32 nsd[7188]: Reload process 16837 failed, continuing with old
database
03:46:32 nsd[16836]: zone xx serial 2018063009 is updated to 2018063010.


What confuses me is that since above error all following attemtps to
transfer and activate the new zone fails, but without any reason.


03:51:50 nsd[16836]: xfrd: zone xx committed "received update to serial
2018063011 at 2018-06-30T03:51:50 from 2a02:850:9::5 TSIG verified with
key rcode0-xxx"
03:51:58 nsd[16836]: xfrd: zone xx: soa serial 2018063011 update failed,
restarting transfer (notified zone)
03:56:05 nsd[16836]: xfrd: zone xx committed "received update to serial
2018063011 at 2018-06-30T03:56:05 from 2a02:850:9::5 TSIG verified with
key rcode0-xxx"
03:56:18 nsd[16836]: xfrd: zone xx: soa serial 2018063011 update failed,
restarting transfer (notified zone)


Any idea why NSD is not logging the cause of the "update failed"? I
guess it is also memory related, or does NSD just not recover from the
initial "Cannot allocate memory"?

Thanks
Klaus
W.C.A. Wijngaards
2018-07-02 13:24:02 UTC
Permalink
Hi Klaus,
Post by Klaus Darilion
Hi!
We use NSD 4.1.6 as slave for a large zone (2G zone file). It seems
NSD tries to recover from the cannot allocate memory failure by
performing the update again. But I guess this also fails (for the same
reason?). Linux has kernel settings on memory overcommit that allow you
to bypass these limits; since NSD shares most of the memory, also after
fork, and this is not the default assumption of the virtual memory
overcommit heuristic. But you don't really need to set it I think,
because you can save memory with database: "" and by upgrading.

With NSD 4.1.6 in use one solution is update to the latest, 4.1.22. Set
database: "" in nsd.conf, that saves about half memory. Then with the
version upgrade, you can save half memory again on that result, by
--enable-packed at compile time and the selective nsec3 allocations.

Best regards, Wouter
Post by Klaus Darilion
03:44:19 nsd[16836]: xfrd: zone xx committed "received update to serial
2018063010 at 2018-06-30T03:44:19 from 2a02:850:9::5 TSIG verified with
key rcode0-distribution"
03:45:30 nsd[16837]: rehash of zone xx. with parameters 1 0 5
6a0bf229ad0c7a2a
03:45:39 nsd[16837]: nsec3 xx 1 %
03:46:29 nsd[16837]: zone xx. received update to serial 2018063010 at
2018-06-30T03:44:19 from 2a02:850:9::5 TSIG verified with key rcode0-xxx
of 1717205965 bytes in 324.657 seconds
03:46:29 nsd[16837]: fork failed: Cannot allocate memory
03:46:32 nsd[16836]: process 16837 exited with status 256
03:46:32 nsd[7188]: handle_reload_cmd: reload closed cmd channel
03:46:32 nsd[7188]: Reload process 16837 failed, continuing with old
database
03:46:32 nsd[16836]: zone xx serial 2018063009 is updated to 2018063010.
What confuses me is that since above error all following attemtps to
transfer and activate the new zone fails, but without any reason.
03:51:50 nsd[16836]: xfrd: zone xx committed "received update to serial
2018063011 at 2018-06-30T03:51:50 from 2a02:850:9::5 TSIG verified with
key rcode0-xxx"
03:51:58 nsd[16836]: xfrd: zone xx: soa serial 2018063011 update failed,
restarting transfer (notified zone)
03:56:05 nsd[16836]: xfrd: zone xx committed "received update to serial
2018063011 at 2018-06-30T03:56:05 from 2a02:850:9::5 TSIG verified with
key rcode0-xxx"
03:56:18 nsd[16836]: xfrd: zone xx: soa serial 2018063011 update failed,
restarting transfer (notified zone)
Any idea why NSD is not logging the cause of the "update failed"? I
guess it is also memory related, or does NSD just not recover from the
initial "Cannot allocate memory"?
Thanks
Klaus
_______________________________________________
nsd-users mailing list
https://open.nlnetlabs.nl/mailman/listinfo/nsd-users
W.C.A. Wijngaards
2018-07-02 13:26:27 UTC
Permalink
Hi Klaus,
Post by W.C.A. Wijngaards
Hi Klaus,
Post by Klaus Darilion
Hi!
We use NSD 4.1.6 as slave for a large zone (2G zone file). It seems
NSD tries to recover from the cannot allocate memory failure by
performing the update again. But I guess this also fails (for the same
reason?). Linux has kernel settings on memory overcommit that allow you
to bypass these limits; since NSD shares most of the memory, also after
fork, and this is not the default assumption of the virtual memory
overcommit heuristic. But you don't really need to set it I think,
because you can save memory with database: "" and by upgrading.
With NSD 4.1.6 in use one solution is update to the latest, 4.1.22. Set
database: "" in nsd.conf, that saves about half memory. Then with the
version upgrade, you can save half memory again on that result, by
--enable-packed at compile time and the selective nsec3 allocations.
Oh and I missed that in 4.1.13 introduced another 15% memory savings
with the --disable-radix-tree configure option. You can use that option
on top of the previous options. I guess that is likely to solve the
fork failed: cannot allocate memory error.

Best regards, Wouter
Post by W.C.A. Wijngaards
Best regards, Wouter
Post by Klaus Darilion
03:44:19 nsd[16836]: xfrd: zone xx committed "received update to serial
2018063010 at 2018-06-30T03:44:19 from 2a02:850:9::5 TSIG verified with
key rcode0-distribution"
03:45:30 nsd[16837]: rehash of zone xx. with parameters 1 0 5
6a0bf229ad0c7a2a
03:45:39 nsd[16837]: nsec3 xx 1 %
03:46:29 nsd[16837]: zone xx. received update to serial 2018063010 at
2018-06-30T03:44:19 from 2a02:850:9::5 TSIG verified with key rcode0-xxx
of 1717205965 bytes in 324.657 seconds
03:46:29 nsd[16837]: fork failed: Cannot allocate memory
03:46:32 nsd[16836]: process 16837 exited with status 256
03:46:32 nsd[7188]: handle_reload_cmd: reload closed cmd channel
03:46:32 nsd[7188]: Reload process 16837 failed, continuing with old
database
03:46:32 nsd[16836]: zone xx serial 2018063009 is updated to 2018063010.
What confuses me is that since above error all following attemtps to
transfer and activate the new zone fails, but without any reason.
03:51:50 nsd[16836]: xfrd: zone xx committed "received update to serial
2018063011 at 2018-06-30T03:51:50 from 2a02:850:9::5 TSIG verified with
key rcode0-xxx"
03:51:58 nsd[16836]: xfrd: zone xx: soa serial 2018063011 update failed,
restarting transfer (notified zone)
03:56:05 nsd[16836]: xfrd: zone xx committed "received update to serial
2018063011 at 2018-06-30T03:56:05 from 2a02:850:9::5 TSIG verified with
key rcode0-xxx"
03:56:18 nsd[16836]: xfrd: zone xx: soa serial 2018063011 update failed,
restarting transfer (notified zone)
Any idea why NSD is not logging the cause of the "update failed"? I
guess it is also memory related, or does NSD just not recover from the
initial "Cannot allocate memory"?
Thanks
Klaus
_______________________________________________
nsd-users mailing list
https://open.nlnetlabs.nl/mailman/listinfo/nsd-users
_______________________________________________
nsd-users mailing list
https://open.nlnetlabs.nl/mailman/listinfo/nsd-users
Klaus Darilion
2018-07-02 16:11:55 UTC
Permalink
Hi Wouter!
Post by W.C.A. Wijngaards
Hi Klaus,
Post by W.C.A. Wijngaards
Hi Klaus,
Post by Klaus Darilion
Hi!
We use NSD 4.1.6 as slave for a large zone (2G zone file). It seems
NSD tries to recover from the cannot allocate memory failure by
performing the update again. But I guess this also fails (for the same
reason?). Linux has kernel settings on memory overcommit that allow you
to bypass these limits; since NSD shares most of the memory, also after
fork, and this is not the default assumption of the virtual memory
overcommit heuristic. But you don't really need to set it I think,
because you can save memory with database: "" and by upgrading.
We already use database: ""
Post by W.C.A. Wijngaards
Post by W.C.A. Wijngaards
With NSD 4.1.6 in use one solution is update to the latest, 4.1.22. Set
database: "" in nsd.conf, that saves about half memory. Then with the
version upgrade, you can save half memory again on that result, by
--enable-packed at compile time and the selective nsec3 allocations.
Oh and I missed that in 4.1.13 introduced another 15% memory savings
with the --disable-radix-tree configure option. You can use that option
on top of the previous options. I guess that is likely to solve the
fork failed: cannot allocate memory error.
Couldn't you make these features enabled with a config option.
Rebuilding is complicated and usually I want to stick with the versions
coming with the Linux distribution.

regards
Klaus
Stuart Henderson
2018-07-03 08:26:16 UTC
Permalink
Post by Klaus Darilion
Hi Wouter!
Post by W.C.A. Wijngaards
Hi Klaus,
Post by W.C.A. Wijngaards
Hi Klaus,
Post by Klaus Darilion
Hi!
We use NSD 4.1.6 as slave for a large zone (2G zone file). It seems
NSD tries to recover from the cannot allocate memory failure by
performing the update again. But I guess this also fails (for the same
reason?). Linux has kernel settings on memory overcommit that allow you
to bypass these limits; since NSD shares most of the memory, also after
fork, and this is not the default assumption of the virtual memory
overcommit heuristic. But you don't really need to set it I think,
because you can save memory with database: "" and by upgrading.
We already use database: ""
Post by W.C.A. Wijngaards
Post by W.C.A. Wijngaards
With NSD 4.1.6 in use one solution is update to the latest, 4.1.22. Set
database: "" in nsd.conf, that saves about half memory. Then with the
version upgrade, you can save half memory again on that result, by
--enable-packed at compile time and the selective nsec3 allocations.
Oh and I missed that in 4.1.13 introduced another 15% memory savings
with the --disable-radix-tree configure option. You can use that option
on top of the previous options. I guess that is likely to solve the
fork failed: cannot allocate memory error.
Couldn't you make these features enabled with a config option.
Rebuilding is complicated and usually I want to stick with the versions
coming with the Linux distribution.
These build options change data structures used by the program - making
that sort of change is really not feasible in runtime config.

Loading...