System Freezes Caused by the Proprietary Nvidia Driver
Today my system locked up yet again, and as far as I can tell this is due to the proprietary Nvidia driver I use and has been present for many driver versions. I keep meaning to look into this problem further and then it gets better and I forget again. I can use the nv driver to get rid of the problem, but lose all 3D acceleration too. How many others have this problem?
After googling it appears that it could be a problem with something in the OpenGL part of the driver, and so I have turned off the OpenGL screensaver I was using as a test. It also seems to be more common when the system is under heavy load too. I have wondered if it might be a problem with TwinView, which I use with my two 17" TFT screens. It is certainly an irritating bug - it is usually necessary to ssh into the system and kill -9 the X process to fix it.
The normal error message accompanying these crashes in /var/log/messages is,
Aug 27 04:48:02 cryos NVRM: Xid: 25, L1 -> L0
Aug 27 04:48:02 cryos NVRM: Xid: 13, 0000 02003900 00000039 00000328 00000000 00000800
After searching on this I found a couple of posts here and here. Also another here suggesting it could be an X86_64 issue here. A post here also indicates very similar issues with Debian. This bug seems to be a similar issue.
I haven't been able to get any more information than the log messages above, and the fact that X gets stuck in a loop consuming 100% of the CPU cycles until it is killed. Not even the keyboard/mouse respond so you need a second system to log in with... I guess this is the problem with a black box binary linked into my kernel - there is no way to debug it. Submitted a bug report to nVidia about a year ago and never received a reply from them.
I have tried enabling and disabling all sorts but nothing seems to change it, and the crashes only happen every day to every few weeks over different xorg-x11 versions and nvidia-kernel versions. It also doesn't affect my work system which is very similar, but only has one TFT screen and a lower end nvidia graphics card...
After googling it appears that it could be a problem with something in the OpenGL part of the driver, and so I have turned off the OpenGL screensaver I was using as a test. It also seems to be more common when the system is under heavy load too. I have wondered if it might be a problem with TwinView, which I use with my two 17" TFT screens. It is certainly an irritating bug - it is usually necessary to ssh into the system and kill -9 the X process to fix it.
The normal error message accompanying these crashes in /var/log/messages is,
Aug 27 04:48:02 cryos NVRM: Xid: 25, L1 -> L0
Aug 27 04:48:02 cryos NVRM: Xid: 13, 0000 02003900 00000039 00000328 00000000 00000800
After searching on this I found a couple of posts here and here. Also another here suggesting it could be an X86_64 issue here. A post here also indicates very similar issues with Debian. This bug seems to be a similar issue.
I haven't been able to get any more information than the log messages above, and the fact that X gets stuck in a loop consuming 100% of the CPU cycles until it is killed. Not even the keyboard/mouse respond so you need a second system to log in with... I guess this is the problem with a black box binary linked into my kernel - there is no way to debug it. Submitted a bug report to nVidia about a year ago and never received a reply from them.
I have tried enabling and disabling all sorts but nothing seems to change it, and the crashes only happen every day to every few weeks over different xorg-x11 versions and nvidia-kernel versions. It also doesn't affect my work system which is very similar, but only has one TFT screen and a lower end nvidia graphics card...
Comments
Display comments as Linear | Threaded
David T. on :
Fwiw, I get these hangs even when no 3D is being displayed.
Gary Roberts on :
Enrique on :
izaac on :
Now it works great, hope this help you.
Marcus D. Hanwell on :
I am only affected once every couple of days to once every couple of weeks. A simple SSH in and kill -9 X cures it, but it sure is irritating. Never happened once on my work system though with very similar set up and Gentoo there too...
opello on :
netegis.com on :
Marcus D. Hanwell on :
Gonzalo Aguilar on :
This seems to be a X.org vs Nvidia related I never had this fail with XFree86.
Hope that someone finds the solution.
Josito on :
It could be great to have another mechanism to kill the X pressing the off button instead of having to reboot the machine or login for other host.
(Offtopic) I'm a developer. I have seen some kind of this problem at work in a HP-UX workstation. I think it has to do with X callbacks at the driver level that make a lock in a thread, so the CPU usage raise to 100% and X not responding. X can do better avoiding the locks, and the driver too doing a workaround.
Marcus D. Hanwell on :
I for one will be going with Intel and/or ATI if their open source drivers improve to a useful level. For now I am stuck with the nVidia binary blob though... May be nouveau will be good enough one day too?