Unix / Linux – Page 6 – 一朝入尘世，恍惚逝百年

怎么为Linux源代码生成补丁

用diff命令来为linux源代码生成补丁。如：

$ diff -Nur /path/to/original/kernel /path/to/your/kernel > change.patch

注意原始的源代码在前，更改后的源代码在后。2.6.x内核的补丁包提交约定指出你需要在每个patch后面写上这样一行：

Sined-off-by: name <email>

这样注明这个补丁包由你制作并贡献给社区。

参看 Documentation/SubmittingPatches可以知道更多关于提交linux源代码补丁的内容。

参看 Documentation/applying-patches.txt 可以知道更多如果打补丁包。

Unix网络编程第13章 Daemon Processes and the inetd Superserver 笔记

# The syslogd daemon runs in an infinite loop that calls select, waiting for any one of its three descriptors to be readable. it reads the log message and does what the configuration file says to do with that message. If the daemon receives the SIGHUP signal, it rereads its configuration file. So, what are the three descriptors that the select system call is waiting for ? 1. A unix domain socket is created and bound to the pathname /var/run/log (/dev/log on some systems). 2. A udp socket is created and bound to port 514(the syslog service). 3. The pathname /devklog is opened. Any error messages from within the kernel appears as input on this device. Newer implementation disable the creation of the UDP socket, unless specified by the administrator, as allowing anyone to send UDP datagrams to this port opens the system up to denial-of-service attacks, where some one could fill up the filesystem.

# syslog函数的%m specification表示当前errno对应的error message.

# syslog函数的level和facility是为了配置如何处理各种log.配置文件是/etc/syslog.conf.

# logger命令可以产生log message。于是可以在shell脚本里使用logger.

# The purpose of the second fork is to guarantee that the daemon cannot automatically acquire a controlling terminal should it open a terminal device in the future. When a session leader without a controlling terminal opens a terminal device(that is not currently some other session's controlling terminal), the termianl becomes the controlling terminal of the session leader. But by calling fork a second time, we guarantee that the second child is no longer a session leader, so it cfannot acquire a controlling terminal. We must ignore SIGHUP because when the session leader terminates(the first child), all processes in the session(our second child)receive the SIGHUP signal.

# daemon通常把当前工作目录设为 / .如果不这样的话就会有可能使得不能unmount某些文件系统。 http://ykyi.net

# On linux, /var/log/message is where the system send all LOG_USER messages after connecting from the same machine(e.g. localhost). Page370.

# 早期的Unix系统，早于4.3BSD.有很多服务像ftp, telnet, rlogin, tftp等都是以daemon的形式运行。每一个都要在进程表里占一个位置(each one took a slot in the process table).但是每个daemon大多数时间都在睡眠状态。从4.3BSD开始引入了inetd.

# inetd的配置对于UDP的wait_flag必须是wait.因为UDP socket只有一个.如果不wait话，parent存在可能性先于child进程得到CPU。而udp socket缓冲中的数据还未来得及读出。这样，inetd的select又返回这个socket可读。wait_flag的wait的意思就是要wait到fork出的子进程结束。而tcp socket会在accept返回的时候给子进程一个connected socket.父进程可以立即得到CPU执行select判断listenning socket是否可读。 http://ykyi.net

# xinetd的配置采用每个服务一个配置文件.而inetd用一个monolithic configuration file.

# On a Berkely-derived kernel the timeout for a tcp connect is normally 75秒.

Unix网络编程第十二章 IPv4 and IPv6 Interoperability

# Ethernet header contains a type fileld of 0x0800, which identifies the frame as an IPv4 frame.

# 若支持dual-stack的server既有IPv4又有IPv6。则IP层让server透明地既可处理IPv4又可处理IPv6.该server需要绑定到wildcard且未设置IPV6_V6ONLY socket option.

# UNP Page359 Figure 12.5 Summary of interoperability between IPv4 and IPv6 clients and servers.

# 尽可能用IPv6, since an IPv6 client can still communicate with IPv4 servers, and vice versa.

Unix网络编程第11章 Name and Address Conversions 笔记

# gethostbyname 和 gethostbyaddr 用来在 IPv4 地址和 hostname 之间转换. getservbyport 和 getservbyname 则是与服务相关。gethostbyname出错时不设errno而是设h_errno，并有hstrerror()函数。

# FQDN的全称是: Fully Qualified domain name. 技术上说必须以点号(period)终止.

# AAAA 被称为 "quad A" rcord, 给出了从hostname到Ipv6地址的映射。 PTR用来把IP地址到hostname.

# Entries in the DNS are known as Resource Records(RRs).

# 一个点分十进制(dotted-decimal)IPv4的地址前加 0::ffff:就是 IPv6的字符串形式。

# 与getpeername对应的函数不是gethostname而是getsockname.

# getaddrinfo函数的host参数指定为dotted-decimal IPv4或 IPv6 hex string,会使得只有IPv4或IPv6的addrinfo返回。

# 不给UDP套接字设置SO_REUSEADDR选项。We do not set the SO_REUSEADDR socket option for the UDP socket because this socket option can allow multiple sockets to bind the same UDP port on hosts that support multicasting. Since there is nothing like TCP's TIME_WAIT state for a UDP socket, there is no need to set this socket option when the server is started.

# 一般情况下，同端口的不同协议对应同样的服务。但也有例外。对于端口514，which is the rsh service with TCP, but the syslog service with UDP.

# gethostbyaddr的第一个参数是char* addr，而其实它并非指向一个char* 事实上指向in_addr结构体。

# getaddrinfo好复杂呀！hint的ai_flags设置了AI_CONONNAME成员得到host的canonical name.

# port 53 是domain service的端口号.

# 如果设置了IPV6_V6ONLY.那么一个来自ipv4 client的连接会被拒绝。

# POSIX says that specifying AF_UNSPEC will return addresses that can be used with any protocol family that can be used with the hostname and service name.

# POSIX specification also implies that if the AI_PASSIVE flag is specified without a hostname, then the IPv6 wildcard address(IN6ADDR_ANY_INIT or 0::0) should be returned as a sockaddr_in6 structure, along with the IPv4 wildcard address(INADDR_ANY or 0.0.0.0), which is returned as a sockaddr_in structure.

# An ipv6 server socket can handle both ipv4 and ipv6 on a dual-stack host. Refer to page319 in UNP for details.

我看过的unix/linux世界的好书

"Advanced Programming in the Unix Environment" Volum I 2nd Edition 大名鼎鼎的 apue 作者是享誉 unix 世界的大牛 Richard Stevens.全书分两卷。第一卷我看了两遍，第二卷翻了翻目录，不想看。

"Linux Device Drivers" 3rd Edition 简称LDD.这本书的中文版翻译的奇烂无比。果断读影印版的，要么就别看了。

"Managing Projects with GNU make" 讲GNU make的书. make这个古老的build工具。怎么说呢。至少我觉得语法设计的非常不友好。无奈的历史问题。

"Version Control with Git" 讲git的书.混了个眼熟。一个人单独做小规模开发用不到那么多特性咯。

<Linux内核设计与实现> 很薄的一本讲linux kernel的书。这本书看的中文版。陈莉君翻译的还不错。只有很少错误。

<Unix编程艺术> 这本书我有中文版和英文版。先买了英文版看看不懂，于是买了中文版看。除非你的英文水平接近有native speaker的水平并且词汇量超大，至少在一万以上吧。否则还是看中文版吧。中文版译得很不错。英文版哥看得非常吃力。5。哥的词汇量接近一万对自己的英文水平看技术类书还是很自信的，但是写这本书的作者是一位极具个性，极具争议的unix hacker，行文风格尽显不羁个性。同样这本书也是争议颇多。支持的奉为圣经，反对的嘲笑作者见识短浅。

"Advanced Bash programming Guide" 讲编写bash脚本的。看的网上的电子版。太多内容了，基本上bash的特性面面惧到。很多内容看了就忘了。

<鸟哥的私房菜基础篇>据说是中文书里算入门的好书了。网上的口啤不错。个人感觉讲的内容非常之浅，不过确实是本入门的好书。但不觉得有收藏价值。这套书还有服务器部分，没有看过。没兴趣也不需要看.

正在看 Richard Stevens 的另一本bible，“Unix network Programming” Volumn I 3 rd Edition。看完了一半。一定要在寒假结束前看完。和apue一样，虽然最新版都是由新的作者在原来的基础上更改的，但仍然保持了Richard Stevens的行文风格用语简练不花哨，详细细致易懂的风格。明年争取啃下 "Understanding Linux Kernel". 计划选择性看部分"Essential Linux Device Drivers". 顺便提一下Richard Steven写的另一套久负盛名的书Tcp/ip详解，共三卷。简单的过了一下第一卷。没有认真看。

Unix 网络编程第十章 SCTP Client/Server Example

这一章的内容还是满少的。也就是给出一个SCTP的简单例子。所以也没有太多需要做笔记的。

1. 什么是 head-of-line blocking.

Head-of-line blocking occurs when a TCP segment is lost and a subsequent TCP segment arrives out of order. That subsequent segment is held until the first TCP segment is retransmitted and arrives at the receiver.

2. 怎么更改SCTP连接的stream的数量。

SCTP连接的streams的数量是在association的握手之前协商好的。对于FreeBSD的KAME实现，SCTP的outbound streams默认为10。这个值可以用setsocket函数更改。与SCTP_INITMSG scoket option相关，设置struct sctp_initmsg结构体。

也可以用sendmsg函数发送ancillary数据来到达同样的目标。但发送ancillary data只对one-to-many形式的sctp socket有效。

3. 怎么结束一个SCTP连接。

可以设置sctp_sndrcvinfo结构的sinfo_flags值的MSG_EOF flag来关闭一个sctp连接gracefully. This flag forces an association to shut down after the message being sent is acknowledged.

还可以给sinfo_flags设置 MSG_ABORT。这样就会立即发送一个ABORT给peer端。任何还没来得及发送出的数据会被丢弃。

Unix网络编程.第九章笔记

# TCP 协议在1981年标准化，而SCTP协议则是在2000年由IETF组织标准化。IETF： Internet Engineering Task Force 互联网工程任务组成立于1985年。

# SCTP支持两种类型的socket. one to one(为了方便现有的tcp程序迁移到sctp) 和 one to many.

# sctp的sctp_bindx可以绑定多个地址，并且可以动态增删。但动态增删不影响已经存在的associations.不过各个unix分支不一定支持这个特性。

# sctp_getpaddrs和sctp_getladdrs分别是getpeername和gethostname的sctp版。They are designed with the concept of a multihoming aware transport protocol.

# sctp传输的报文是有边界的。sctp在传输层支持传输超大的message.

Unix网络编程.第七章笔记

1. 有一个 Generic Socket Option：SO_BROADCAST 用来开启是否充许广播。2. 仅有TCP支持SO_DEBUG。Kernel把详细的发送包的接收包信息存在一个环路buffer里。可以用trpt查看它。3. SO_ERROR套接字选项只能被取得，不能被设置。如果进程阻塞在select调用里，不论是读还是写，select都会返回conditions set。如果进程使用signal-driven I/O，则SIGIO信号会被发到这个进程或者它所在的进程组。进程用getsockopt得到so_error后，so_error就会被重置为0.如果so_error不为0，此时调用 read 或者 write 就会立即返回-1，errno被置为so_error的值。

4. SO_KEEPALIVE在一个tcp连接没有任何收发达两个小时时会发一个prob给对方。两个钟喔。好长的时间啊！！！Richard Stevens认为SO_KEEPALIVE应该改叫 make-dead。大多Unix的kernel把这个两小时时长存为一个系统级的变量。这意味着如果你用某种方法改变了这个时长，会影响这个操作系统的所有socket.Richard Stevens认为想把这个时长改短的想法是错误理解了这个选项的意义。

5. SO_LINGER仅仅适用于面向连接的socket，比如TCP和SCTP，因此不适用UDP。

Struct linger{

int l_onofff;

int l_linger;

};

L_onoff为零是，close的behavior就是默认情况：close立即返回，kernel继续发送缓冲区内未发送的数据。当l_onoff不为零，而l_linger为0，TCP协议则会立即终止连接，发送RST给peer端，这样做避免了TCP的TIME_WAIT状态，这样的坏处是: leaves open the possibility of another incarnation of this connection being created within 2MSL seconds and having old duplicate segments from the just-terminated connection being incorrectly delivered to the new incarnation.对于SCTP, 也会发送一个ABORT chunk给peer.第三种情况：如果l_onoff为真，l_linger不为0.则close不会立即返回，会最多等待指定的时长，在这段时间里kernel发送缓冲区内的数据给peer。如果在这段时间内发送完则close返回0.如果没有发送完毕，close就会返回EWOULDBLOCK，send buffer里的数据会被丢弃。

6. MSL: maximum segment lifetime.

7. 因此close默认情况下立即返回，即使用SO_LINGER设置等待时间也在理论上存在close先于发送缓冲区的数据被peer端acknowleded的情况。因此一个更好的解决方案是用shutdown系统调用with a second argument of SHUT_WR。在调用shutdown后，用read调用直到read返回0，即收到peer端的FIN。

8. UDP没有congestion control，如果发得太快，不仅仅peer端来不及收。A fast sender can overwhelm its own network interface, causing datagrams to be discarded by the sender itself.

9. 因为tcp在三次握手阶段的SYN segment里交换窗口大小(windows scale)。所以tcp的接收缓冲大小必须在shank hands之前就设置好。Connected socket从listening socket继承这个选项。

10. TCP的套接字缓冲区大小至少需要是4倍MSS的大小。这里指的套接字缓冲，如果是针对单向的传输，比如单向传一个文件，指的是发送方的发送缓冲和接收方的接收缓冲。如果是双向的传输，则指的是双方的接收和发送缓冲。为什么tcp套接字的缓冲区需要至少是4倍MSS的大小呢？这是因为tcp的快恢复(fast recovery)算法。快恢复算法定义：如果接收端连续收到三个同样的ACK，就认为有packet丢失了。而接收端在segment丢失后，每收到新的segment就不停地重发一个重复的ack给sender。如果窗口大小小于4个segments,就不会有三个重复的ACK,所以快恢复算法就没办法工作了。

11. What is bandwidth-delay product.

A: The capacity of the pipe is called the bandwidth-delay product and we calculate this by multiplying the bandwidth(in bits/sec) times the RTT(in seconds), converting the result from bits to bytes. 即网速乘上RTT.

12. UDP没有send buffer，但有一个send buffer size.只要socket的buffer size大于LO_WATER，就永远是可写的。

13. SO_REUSEADDR有四种用途。好难写。见Unix Network Programming Section 7.5 影印版第211页啦。

14. It's Ok for the MSS to be different in each direction.

15. Nagle算法是为了减少small package的数量.The algorithm states that if a given connection has outstanding data, then no small packets will be sent on the connection in response to a user write operation until the existing data is acknowledged. Small package 指的是任何比MSS小的package.

16. Delayed ACK algorithm: This algorithm causes TCP to not send an ACK immediately when it receives data; instead, TCP will wait some small amount of time (typically 50-200ms)and only then send the ACK.

17. SCTP有一个SCTP_AUTOCLOSE套接字选项。This option allows us to fetch or set the autoclose time for an SCTP endpoint. The autoclose time is the number of seconds an SCTP association will remain open when idle.即SCTP association能够保持空闲状态的最长时间。超时就会被关闭。

Unix网络编程.第五，六章笔记.

# Unix世界里经常看到的 pst 是 pseudo terminal 的意思啊。

# ps -t pts/6 -o pid,ppid,tty,stat,args,wchan

ps 命令的进程状态。 S表示进程在睡眠，I表示进程在等待输入，O表示进程在等待输出。当进程在S状态时，wlan指示了更详细的状态信息。

# SIGSTOP 和 SIGKILL 两个posix信号不可以被caught.

# 缺省情况下，Unix的信号是不排在队列中的。这意味着多个相同signal到达的时候如果没有来得及处理，就只会记下一个signal.如果需要稳定的信号支持，就要使用RealTime Posix接口。

# init 进程的PID是 1.

# 可以用sigprocmask函数block和unblock信号。This let us protect a critical region of code by preventing certain signals from being caught while that region of code is executing.

# 对于 Unix System V 和 Unix98, the child of a process does not become a zombie if the process sets the disposition of SIGCHLD to SIG_IGN. unfortunately, tis works only under System V & Unix98. Posix明确指票这个行为是未定义的。The portable way to handle zombies is to catch SIGCHILD & call wait or waitpid.

# waitpid 有个 WNOHANG 选项可以让waitpid立即返回。

# POSIX定义了asynchronous I/O model .但是 few systems support POSIX asynchronous I/O. The main difference between asynchronous I/O model and signal-driven I/O model is that with signal-driven I/O, the kernel tells us when an I/O operation can be initiated but with asynchronous I/O, the kernel tells us when an I/O operation is completed.

# 想得到本机到某台机的RTT。怎么做呢？kao,不要太容易啊。用 ping 啊！！！

linux的soname，链接库的问题.

第一次看到soname是读<程序员的自我修养.第一册:链接库>看来的。顺便说一下，这本书确实是本好书吖，国内出的难得技术好书。后来在公司的项目中把它从windows平台下移植到linux下，写makefile时用到过。

再后来又忘记了。今天再次看到soname的时候就有点记不起来了。又只好搜索了一些资料看。趁热打铁把学会的东西记下来。

linux为尝试解决动态链接库版本冲突的问题(windows平台下称之为DLL HELL)，引入了soname进制。linux的动态链接库的扩展名约定为.so 即shared object的缩写。下面的makefile语句来自于哥今年上半年写的makefile文件:

VERSION_NUM := 1.0.0
soname = lib$(dirname).so.$(firstword $(subst ., ,$(VERSION_NUM)))
LINKFLAGS := -shared -Wl,-soname,$(soname) -L. -lpthread -lrt -lpentagon -lcross_platform
其中的变量$(soname)会被替换为eibcomm.so.1。再看看LINKFLAGS变量指定的几个选项. -shared 表示要生成一个shared object，即动态共享库. -pthread -lrt -lpentagon -lcross_platform 是要链接其它几个库，thread和linux的posix线程库,rt是一个和实时系统相关的库它提供了高精度的时间函数。

比较奇怪的是 -Wl,-soname,$(soname) 它们之前用逗号分开，而不是用空格分开。实际上这几个东东会被gcc传给ld(linux下的链接器)作为链接器的参数。gcc规定的写法就是这样的。指定的soname会被链接器写入到最终生成的elf文件中(linux下的可执行文件格式)。要查看elf文件的段布局什么的，可以用readelf工具。

最终生成的文件将命名为 libname.so.x.y.z 的形式。soname是libname.so.x的格式(也不尽然，这只是一个convetion,并非在语法或机制上强行要求)。接着把生成的so文件拷贝到/usr/lib 或者 /lib，这两个文件夹是系统用来放置动态链接库的地方。debian下没有把/usr/local/lib作为默认的动态链接库文件夹.可以通过改/etc/ld.so.conf 文件把更多的路径加入。再运行 ldconfig,更新cache中的库文件信息。为什么要用cache呢，有人就问了！因为library实在太多了咯，cache就被引用了。cache是个好东西啊！！！

这个时候依赖 libname.so.x 的文件就可以找到这个library并加载起来了。但是编译程序时还是找不到呢.编译程序时通常会写 -lname 这个时候会先找 libname.so 再找 libname.a 文件，结果找不到。link就会抱怨没有找到这个库文件。为了让编译通过，于是你就用 ln -sf libname.so libname.so.x.y.z (这个是realname,版本号部分可能只有x.y两个部分) 建一个软链接！搞定！