退出

goroutine退出过程 #

goroutine退出,即执行完callee代码后,返回到caller中去,前面一节我们看到,编译器自己把goexit()的地址设置为了caller的pc保存到栈上方,所以退出后,会执行goexit()函数,但是main goroutine比较特殊,这个groutine运行的代码 main函数直接调用了操作系统exit()这个API退出,没有机会返回到caller层.

main goroutine的退出 #

在上节中我们看到程序执行到了 mian函数

// The main goroutine.
func main() {
	g := getg()
	//...
 
    //main包 init函数,递归的调用import包中定义的init函数
    fn := main_init // make an indirect call, as the linker doesn't know the address of the main package when laying down the runtime
    fn()
 
    //...
     
    //调用main.main函数(用户定义的main函数):进行间接调用是因为链接器在放置运行时不知道主包的地址
    fn = main_main // make an indirect call, as the linker doesn't know the address of the main package when laying down the runtime
    fn()
     
    //...
 
    //系统API:exit函数,退出进程
    exit(0)
     
	for { 
		var x *int32
		*x = 0 // 无效指针代码,会导致程序退出
	}
}
  • exit(0)函数与最底部的for循环会让程序不可能回到caller层

非main goroutine退出 #

我们首先来gdb调试一下这个程序

// main.go
package main

import "time"

// the function's body is empty
func add(x, y int64) int64

func main() {
    go add(2, 3)

    time.Sleep(time.Minute)
}

//  add_amd.s
TEXT ·add(SB),$0-24
	MOVQ x+0(FP), BX
	MOVQ y+8(FP), BP
	ADDQ BP, BX
	MOVQ BX, ret+16(FP)
	RET

编译一下源代码: go build -gcflags "-N -l" -o test ..

gdb断点 #

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
// //list /tmp/kubernets/main.go:3 
// list /tmp/kubernets/add_amd.s:3
// list /usr/lib/golang/src/runtime/asm_amd64.s:1356

(gdb) disas
Dump of assembler code for function main.add:
   0x0000000000456b40 <+0>:     mov    0x8(%rsp),%rbx
   0x0000000000456b45 <+5>:     mov    0x10(%rsp),%rbp
   0x0000000000456b4a <+10>:    add    %rbp,%rbx
   0x0000000000456b4d <+13>:    mov    %rbx,0x18(%rsp)
=> 0x0000000000456b52 <+18>:    retq
End of assembler dump.
(gdb) step
Single stepping until exit from function main.add,
which has no line number information.

Breakpoint 1, runtime.goexit () at /usr/lib/golang/src/runtime/asm_amd64.s:1358
    		CALL	runtime·goexit1(SB)	// does not return
(gdb) list

	// The top-most function running on a goroutine
	// returns to goexit+PCQuantum.
	TEXT runtime·goexit(SB),NOSPLIT,$0-0
		BYTE	$0x90	// NOP
		CALL	runtime·goexit1(SB)	// does not return
		// traceback from goexit1 must hit code range of goexit
		BYTE	$0x90	// NOP

	// This is called from .init_array and follows the platform, not Go, ABI.
(gdb)
(gdb) step
runtime.goexit1 () at /usr/lib/golang/src/runtime/proc.go:2663
    	func goexit1() {
        //...
		mcall(goexit0)     // goexist1()-->mcall(goexit0)
(gdb) step
(gdb) list
  	// func mcall(fn func(*g))
  	// Switch to m->g0's stack, call fn(g).
  	// Fn must never return. It should gogo(&g->sched)
  	// to keep running g.
  	TEXT runtime·mcall(SB), NOSPLIT, $0-8
  		MOVQ	fn+0(FP), DI
  
  		get_tls(CX)
  		MOVQ	g(CX), AX	// save state in g->sched
  		MOVQ	0(SP), BX	// caller's PC
(gdb)

退出三部曲 #

从调试结果可以看到从add函数返回后的跳转:

第一步第二步第三步
src/runtime/asm_amd64.ssrc/runtime/proc.gosrc/runtime/asm_amd64.s
runtime.goexit ()goexit1()runtime.mcall()

再看一下源码:

// The top-most function running on a goroutine
// returns to goexit+PCQuantum.
TEXT runtime·goexit(SB),NOSPLIT,$0-0
	BYTE	$0x90	// NOP
	CALL	runtime·goexit1(SB)	// does not return
	// traceback from goexit1 must hit code range of goexit
	BYTE	$0x90	// NOP


// Finishes execution of the current goroutine.
func goexit1() {
	if raceenabled {   //忽略
		racegoend()
	}
	if trace.enabled {   //忽略
		traceGoEnd()
	}
	mcall(goexit0)
}

// func mcall(fn func(*g))
// Switch to m->g0's stack, call fn(g).
// Fn must never return. It should gogo(&g->sched)
// to keep running g.
TEXT runtime·mcall(SB), NOSPLIT, $0-8
	MOVQ	fn+0(FP), DI //参数

	get_tls(CX)
	MOVQ	g(CX), AX // save state in gN->sched
	MOVQ	0(SP), BX // caller's PC   -->看下方的图
	MOVQ	BX, (g_sched+gobuf_pc)(AX) //保存caller's pc到正在运行的gN.sched.pc
	LEAQ	fn+0(FP), BX // caller's SP
	MOVQ	BX, (g_sched+gobuf_sp)(AX) //保存caller's sp到正在运行的gN.sched.sp
	MOVQ	AX, (g_sched+gobuf_g)(AX) //保存gN到正在运行的gN.sched.g
	MOVQ	BP, (g_sched+gobuf_bp)(AX) //保存bp到正在运行的gN.sched.bp

	// switch to m->g0 & its stack, call fn
	MOVQ	g(CX), BX // bx=gN
	MOVQ	g_m(BX), BX // bx=gN.m
	MOVQ	m_g0(BX), SI // si=gN.m.g0
	CMPQ	SI, AX // if g == m->g0 call badmcall; 这个gN不能等于g0, g0应该是用户调度用的.
	JNE	3(PC)
	MOVQ	$runtime·badmcall(SB), AX
	JMP	AX
	MOVQ	SI, g(CX)	// g = m->g0; 就是把m.tls[0](TLS)的值从gN的地址换为g0的地址,这样线程通过fs寄存器能找到g0继而找到m   -----------here
	MOVQ	(g_sched+gobuf_sp)(SI), SP // sp = m->g0->sched.sp,把g0的寄存器SP恢复到真实的SP   ---------------here
	PUSHQ	AX //gN压栈,作为后面call的参数
	MOVQ	DI, DX //dx = di(fn函数结构体)
	MOVQ	0(DI), DI //所以这里是取真正的fn
	CALL	DI //开始调用fn
	POPQ	AX
	MOVQ	$runtime·badmcall2(SB), AX
	JMP	AX
	RET

mcall函数 #

总结mcall:

  • 保存当前g的调度信息,寄存器保存到g.sched;
  • 把g0设置到tls中,修改CPU的rsp寄存器使其指向g0的栈;
  • 以当前运行的g(我们这个场景是gN)为参数调用fn函数(此处为goexit0).

看下这个mcall函数的形参,它不是一个直接指向函数代码的指针,而是一个指向funcval结构体对象的指针,funcval结构体对象的第一个成员fn才是真正指向函数代码的指针.

	MOVQ DI, DX        # dx = di(fn函数)
	MOVQ 0(DI), DI     # 所以这里是取真正的fn
type funcval struct {
    fn uintptr
    // variable-size, fn-specific data here
}

20220624171607

  • g0 -> gN: gogo会进行栈的切换,同时里面的jmp指令会跳转到gN的g结构schdt.pc保存的地址去执行.
  • gN -> g0: mcall会进行栈的切换,它从上游用户代码退出后,进入的goexit函数就已经是go系统代码了,直接继续执行下去就好,不需要jmp跳转.
    • 这里恢复栈是只恢复了sp,没有把pc重置!!!

goexit0函数 #

  • 更改g状态:_Grunning -> _Gdead
  • 调用dropg函数解除g和m之间的关系,其实就是设置g->m = nil, m->currg = nil
  • 调用gfput函数
    • 把g放入p的freeg队列缓存起来供下次创建g时利用,不用再重新生成一个新g
    • 如本地gfree列表太长,则放到全局去
  • 调用schedule函数再次进行调度
// goexit continuation on g0.
func goexit0(gp *g) {
	_g_ := getg() // g0

	casgstatus(gp, _Grunning, _Gdead) // 修改gN的状态
	if isSystemGoroutine(gp, false) {
		atomic.Xadd(&sched.ngsys, -1)
	}
	gp.m = nil
	locked := gp.lockedm != 0
	gp.lockedm = 0
	_g_.m.lockedg = 0
	gp.paniconfault = false
	gp._defer = nil // should be true already but just in case.
	gp._panic = nil // non-nil for Goexit during panic. points at stack-allocated data.
	gp.writebuf = nil
	gp.waitreason = 0
	gp.param = nil
	gp.labels = nil
	gp.timer = nil

    //...

	// Note that gp's stack scan is now "valid" because it has no
	// stack.
	gp.gcscanvalid = true
	dropg() //dropg函数解除g和m之间的关系,其实就是设置g->m = nil, m->currg = nil.

    //...

    gfput(_g_.m.p.ptr(), gp) //放在gfree列表中,如果本地列表太长,则将一个批次转移到全局列表中.
    
    //...

	schedule()
}

退出流程 #

20220624171805

  • 上图蓝色框起来的就是退出调用的函数链:
goexit()->goexit1()->mcall()->goexit0()->schedule()
______________gN栈_________|__________g0栈_________|