First, go through the basic questions of TroubleShooting.
If you are using packages from your distribution and are unable/unwilling to test the latest versions of all the pieces of nouveau, send the bug reports to your distribution and not directly to us. If you're using an out-of-date software version, our first question will probably be "does it still happen on latest".
We use Gitlab issues. Bugs in the 2D driver and the Nouveau DRM (kernel) part are filed under drm/nouveau. Feel free to submit bugs about 2D implementation, but please search the issue list before submitting new bugs. If you are not sure your bug is a manifestation of an existing bug report, do open a new bug.
Bugs in the 3D driver are filed under mesa/mesa (see all nouveau issue reports). Please, check MesaDrivers page before submitting any reports.
- Attach complete, unfiltered, untrimmed kernel log from the boot up to and including the problem, and a complete X log if the problem manifests with X. Note, that
/var/log/messages
is not a kernel log. Runningdmesg
is the best way to get a kernel log, but assuming, that the log buffer has not wrapped around. - Please include version numbers or checkout dates for all relevant components. This could be the kernel and DRM, x-server, xf86-video-nouveau, libdrm and possibly mesa.
- Do not use links that go invalid in time (e.g., pastebins, image bins, your web server at home), attach your files to the bug instead. Bug reports may be useful even after years.
- If the bug is related to modesetting, output configuration, etc, please attach VBIOS from your card. We also have a mailing list where you can ask questions, discuss patches or whatever is related to nouveau and its tools.
Getting a kernel log
Quick guide
- add the following to the kernel command line:
log_buf_len=1M
- exercise your problem
- run the command:
dmesg > kernel_log.txt
- use the file
kernel_log.txt
created above in your bug report
Explanation
The best way is to use dmesg
command and direct the output to a file. A problem with dmesg is, that it uses the kernel log buffer, which may wrap around. Therefore, use log_buf_len=1M
on the kernel command line to increase the log buffer size to 1MB. The wraparound can be noticed by looking at the first lines in a dmesg output, it should be something like this (may vary according to kernel version):
[ 0.000000] Linux version 2.6.34-gentoo-r1 (root@localhost) (gcc version 4.3.4 (Gentoo 4.3.4 p1.1, pie-10.1.5) ) #1 PREEMPT Mon Aug 2 16:04:12 EEST 2010
[ 0.000000] Command line: root=/dev/sda5
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
[ 0.000000] BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
The benefits of using dmesg
command are:
- you get all kernel messages, also from debug level
- you get only kernel messages, without noise from user space like in system logger files
- you get messages from this boot only, no need to cut other boots out of the file
- it is what the developers expect to see
It is really the command
dmesg
to be used. A file like/var/log/dmesg.log
or similar is not what we need.
Getting GSP-RM logs
When to follow these steps
These instructions should be followed only upon the behest of an Nvidia employee who is debugging an issue for you. Extracting meaningful data from the log buffers requires tools and files that are only available to Nvidia employees. Unlike problems with the Nvidia proprietary driver, there is no formal process for Nvidia to debug Nouveau issues.
Supported kernels
This feature is supported on v6.14 (6.14-rc1 specifically) and later kernel versions. To support older kernels, you will need to back-port these three commits:
97118a1816d2 ("drm/nouveau: create module debugfs root")
7c995e2fd966 ("drm/nouveau: retain device pointer in nvkm_gsp_mem object")
214c9539cf2f ("drm/nouveau: expose GSP-RM logging buffers via debugfs")
Explanation
GSP-RM is the firmware that runs on the GPU System Processor on Turing and later GPUs. On these GPUs, Nouveau acts similarly to Nvidia's proprietary driver: it boots GSP-RM and uses it to control and manage the GPU.
GSP-RM also produces printf-like logs, but extracting those logs is complicated. The logs can be interpreted only by an Nvidia employee, using tools that are not available to the public.
This section describes the process of extracting those logs so that Nvidia can use them for debugging Nouveau.
Quick guide
- Use an appropriate kernel with debugfs enabled. Root access is not required.
- Optionally set the
keep-gsp-logging=1
parameter if debugging GSP-RM boot failure. - When needed, copy the debugfs entries to normal files.
- Give the files to your friendly neighborhood Nvidia employee.
Configure the kernel
Assuming you're using an appropriate kernel version, the logs are exposed via debugfs, in the /sys/kernel/debug/nouveau/ directory.
keep-gsp-logging
Before Nouveau can use GSP-RM to control the GPU, the driver must first boot it. This is a complex process that can fail in many ways. By definition, GSP-RM is considered to be booted when the driver receives the INIT_DONE
message from GSP-RM.
Any failure that happens before then is considered a GSP-RM boot failure, which results in Nouveau shutting down the GSP completely and releasing all resources. This includes any logs that were generated by GSP-RM.
To preserve these logs, add the keep-gsp-logging=1
command-line parameter to Nouveau (e.g. adding options nouveau keep-gsp-logging=1
to /etc/modprobe.d/nouveau.conf
).
When set, the debugfs entries will be preserved on GSP-RM failure until the Nouveau module is unloaded, but only for log buffers that actually have logs in them.
Unlike the kernel's printk, the GSP-RM log buffers are only active late in the GSP-RM boot process. If a failure occurs early (e.g. because of a bad firmware signature), then the log buffers will be empty. In this case, other methods must be used to debug the failure.
Copying the debugfs entries
Each GPU in the system will have its own debugfs root directory for the logging buffers. This directory is of the form /sys/kernel/debug/nouveau/<device-id>
, where <device-id>
is typically the PCI ID, e.g.
# ls -l /sys/kernel/debug/nouveau/0000:65:00.0/
total 0
-r--r--r-- 1 root root 65536 Apr 14 12:51 loginit
-r--r--r-- 1 root root 65536 Apr 14 12:51 logintr
-r--r--r-- 1 root root 4096 Apr 14 12:51 logpmu
-r--r--r-- 1 root root 65536 Apr 14 12:51 logrm
To make these logs available for processing, they must be copied to regular files. For example:
cp /sys/kernel/debug/nouveau/0000:65:00.0/loginit loginit
cp /sys/kernel/debug/nouveau/0000:65:00.0/logintr logintr
cp /sys/kernel/debug/nouveau/0000:65:00.0/logrm logrm
cp /sys/kernel/debug/nouveau/0000:65:00.0/logpmu logpmu
Note: you can quickly determine if a logfile is empty by examining the first 8 bytes. If they are 0, then there are no logs.
# hexdump -n 8 -e '1/8 "%016x\n"' logrm
000000000000015b
# hexdump -n 8 -e '1/8 "%016x\n"' logpmu
0000000000000000
Here, you can see that logrm has logs, but logpmu does not. This is normal.