A clone of btpd with my configuration changes.
25'ten fazla konu seçemezsiniz Konular bir harf veya rakamla başlamalı, kısa çizgiler ('-') içerebilir ve en fazla 35 karakter uzunluğunda olabilir.

986 satır
25 KiB

  1. /*
  2. * Copyright (c) 2006 Mathew Mills <mathewmills@mac.com>
  3. * All rights reserved.
  4. *
  5. * Redistribution and use in source and binary forms, with or without
  6. * modification, are permitted provided that the following conditions
  7. * are met:
  8. * 1. Redistributions of source code must retain the above copyright
  9. * notice, this list of conditions and the following disclaimer.
  10. * 2. Redistributions in binary form must reproduce the above copyright
  11. * notice, this list of conditions and the following disclaimer in the
  12. * documentation and/or other materials provided with the distribution.
  13. * 3. The name of the author may not be used to endorse or promote products
  14. * derived from this software without specific prior written permission.
  15. *
  16. * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
  17. * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
  18. * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
  19. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
  20. * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
  21. * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  22. * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  23. * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  24. * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
  25. * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  26. */
  27. /*
  28. * Meta-level comments: You know that a kernel interface is wrong if
  29. * supporting it requires three times more code than any of the other
  30. * kernel interfaces supported in libevent. Niels - 2006-02-22
  31. */
  32. /**
  33. "RTSIG" is a shorthand for using O_ASYNC to make descriptors send
  34. signals when readable/writable and to use POSIX real-time signals
  35. witch are queued unlike normal signals. At first blush this may
  36. seem like a alternative to epoll, but a number of problems arise
  37. when attempting to build an eventloop entirely out of rtsig.
  38. Still, we can use rtsig in combination with poll() to
  39. provide an eventloop that allows for many thousands of sockets
  40. without huge overheads implicit with using select() or poll()
  41. alone. epoll and kqueue are far superior to rtsig and should be
  42. used where available, but rtsig has been in standard Linux kernels
  43. for a long time and have a huge installation base. epoll requires
  44. special patches for 2.4 kernels and 2.6 kernels are not yet nearly
  45. so ubiquitous.
  46. rtsig problems:
  47. - O_ASYNC mechanisms work only on sockets - not pipes or tty's
  48. - O_ASYNC signals are edge-triggered, POLLIN on packet arriving
  49. or socket close; POLLOUT when a socket transitions from
  50. non-writable to writable. Being edge-triggered means the
  51. event-handler callbacks must transition the level ( reading
  52. completely the socket buffer contents ) or it will be unable to
  53. reliably receive notification again.
  54. - rtsig implementations must be intimately involved in how a
  55. process dispatches signals.
  56. - delivering signals per-event can be expensive, CPU-wise, but
  57. sigtimedwait() blocks on signals only and means non-sockets
  58. cannot be serviced.
  59. Theory of operation:
  60. This libevent module uses rtsig to allow us to manage a set of
  61. poll-event descriptors. We can drop uninteresting fd's from the
  62. pollset if the fd will send a signal when it becomes interesting
  63. again.
  64. poll() offers us level-triggering and, when we have verified the
  65. level of a socket, we can trust the edge-trigger nature of the
  66. ASYNC signal.
  67. As an eventloop we must poll for external events but leverage
  68. kernel functionality to sleep between events ( until the loop's
  69. next scheduled timed event ).
  70. If we are polling on any non-sockets then we simply have no choice
  71. about blocking on the poll() call. If we blocked on the
  72. sigtimedwait() call as rtsig papers recommend we will not wake on
  73. non-socket state transitions. As part of libevent, this module
  74. must support non-socket polling.
  75. Many applications, however, do not need to poll on non-sockets and
  76. so this module should be able to optimize this case by using
  77. sigtimedwait(). For this reason this module can actually trigger
  78. events in each of three different ways:
  79. - poll() returning ready events from descriptors in the pollset
  80. - real-time signals dequeued via sigtimedwait()
  81. - real-time signals that call an installed signal handler which in
  82. turn writes the contents of siginfo to one end of a socketpair
  83. DGRAM socket. The other end of the socket is always in the
  84. pollset so poll will be guaranteed to return even if the signal is
  85. received before entering poll().
  86. non-socket descriptors force us to block on the poll() for the
  87. duration of a dispatch. In this case we unblock (w/ sigprocmask)
  88. the managed signals just before polling. Each managed signal is
  89. handled by signal_handler() which send()'s the contents of siginfo
  90. over the socketpair. Otherwise, we call poll() with a timeout of
  91. 0ms so it checks the levels of the fd's in the pollset and returns
  92. immediately. Any fd that is a socket and has no active state is
  93. removed from the pollset for the next pass -- we will rely on
  94. getting a signal for events on these fd's.
  95. The receiving end of the siginfo socketpair is in the pollset
  96. (permanently) so if we are polling on non-sockets, the delivery of
  97. signals immediately following sigprocmask( SIG_UNBLOCK...) will
  98. result in a readable op->signal_recv_fd which ensures the poll()
  99. will return immediately. If the poll() call is blocking and a
  100. signal arrives ( possibly a real-time signal from a socket not in
  101. the pollset ) its handler will write the data to the socketpair
  102. and interrupt the poll().
  103. After every poll call we attempt a non-blocking recv from the
  104. signal_recv_fd and continue to recv and dispatch the events until
  105. recv indicates the socket buffer is empty.
  106. One might raise concerns about receiving event activations from
  107. both poll() and from the rtsig data in the signal_recv_fd.
  108. Fortunately, libevent is already structured for event coalescing,
  109. so this issue is mitigated ( though we do some work twice for the
  110. same event making us less efficient ). I suspect that the cost of
  111. turning off the O_ASYNC flag on fd's in the pollset is more
  112. expensive than handling some events twice. Looking at the
  113. kernel's source code for setting O_ASYNC, it looks like it takes a
  114. global kernel lock...
  115. After a poll and recv-loop for the signal_recv_fd, we finally do a
  116. sigtimedwait(). sigtimedwait will only block if we haven't
  117. blocked in poll() and we have not enqueued events from either the
  118. poll or the recv-loop. Because sigtimedwait blocks all signals
  119. that are not in the set of signals to be dequeued, we need to
  120. dequeue almost all signals and make sure we dispatch them
  121. correctly. We dequeue any signal that is not blocked as well as
  122. all libevent-managed signals. If we get a signal that is not
  123. managed by libevent we lookup the sigaction for the specific
  124. signal and call that function ourselves.
  125. Finally, I should mention that getting a SIGIO signal indicates
  126. that the rtsig buffer has overflowed and we have lost events.
  127. This forces us to add _every_ descriptor to the pollset to recover.
  128. */
  129. #ifdef HAVE_CONFIG_H
  130. #include "config.h"
  131. #endif
  132. /* Enable F_SETSIG and F_SETOWN */
  133. #define _GNU_SOURCE
  134. #include <sys/types.h>
  135. #ifdef HAVE_SYS_TIME_H
  136. #include <sys/time.h>
  137. #else
  138. #include <sys/_time.h>
  139. #endif
  140. #include <assert.h>
  141. #include <errno.h>
  142. #include <fcntl.h>
  143. #include <signal.h>
  144. #include <stdio.h>
  145. #include <stdlib.h>
  146. #include <string.h>
  147. #include <sys/poll.h>
  148. #include <sys/queue.h>
  149. #include <sys/tree.h>
  150. #include <unistd.h>
  151. #include <sys/socket.h>
  152. #include "event.h"
  153. #include "event-internal.h"
  154. #include "log.h"
  155. extern struct event_list signalqueue;
  156. #include <linux/unistd.h>
  157. #ifndef __NR_gettid
  158. #define gettid() getpid()
  159. #else
  160. #if ((__GLIBC__ > 2) || ((__GLIBC__ == 2) && (__GLIBC_MINOR__ >= 3)))
  161. _syscall0(pid_t,gettid)
  162. #endif
  163. #endif
  164. #define EVLIST_NONSOCK 0x1000 /* event is for a non-socket file-descriptor */
  165. #define EVLIST_DONTDEL 0x2000 /* event should always be in the pollset */
  166. #define MAXBUFFERSIZE (1024 * 1024 * 2) /* max socketbuffer for signal-spair */
  167. #define INIT_MAX 16 /* init/min # of fd positions in our pollset */
  168. static int signal_send_fd[_NSIG]; /* the globalend of the signal socketpair */
  169. static int trouble[_NSIG]; /* 1 when signal-handler cant send to signal_send_fd */
  170. struct rtdata;
  171. TAILQ_HEAD(rtdata_list, rtdata);
  172. struct rtsigop {
  173. sigset_t sigs; /* signal mask for all _managed_ signals */
  174. struct pollfd *poll; /* poll structures */
  175. struct rtdata **ptodat; /* map poll_position to rtdata */
  176. int cur; /* cur # fd's in a poll set */
  177. int max; /* max # fd's in a poll set, start at 16 and grow as needed */
  178. int total; /* count of fd's we are watching now */
  179. int signo; /* the signo we use for ASYNC fd notifications */
  180. int nonsock; /* number of non-socket fd's we are watching */
  181. int highestfd; /* highest fd accomodated by fdtodat */
  182. struct rtdata_list **fdtodat; /* map fd to rtdata ( and thus to event ) */
  183. int signal_recv_fd; /* recv side of the signal_send_fd */
  184. int signal_send_fd; /* recv side of the signal_send_fd */
  185. struct event sigfdev; /* our own event structure for the signal fd */
  186. };
  187. struct rtdata {
  188. /* rtdata holds rtsig-private state on each event */
  189. TAILQ_ENTRY (rtdata) next;
  190. struct event *ev;
  191. int poll_position;
  192. };
  193. void *rtsig_init(void);
  194. int rtsig_add(void *, struct event *);
  195. int rtsig_del(void *, struct event *);
  196. int rtsig_recalc(struct event_base *, void *, int);
  197. int rtsig_dispatch(struct event_base *, void *, struct timeval *);
  198. struct eventop rtsigops = {
  199. "rtsig",
  200. rtsig_init,
  201. rtsig_add,
  202. rtsig_del,
  203. rtsig_recalc,
  204. rtsig_dispatch
  205. };
  206. static void
  207. signal_handler(int sig, siginfo_t *info, void *ctx)
  208. {
  209. /*
  210. * the signal handler for all libevent-managed signals only
  211. * used if we need to do a blocking poll() call due to
  212. * non-socket fd's in the pollset.
  213. */
  214. siginfo_t *i = info;
  215. siginfo_t i_local;
  216. if (trouble[sig - 1]) {
  217. i_local.si_signo = SIGIO;
  218. i_local.si_errno = 0;
  219. i_local.si_code = 0;
  220. i = &i_local;
  221. trouble[sig - 1] = 0;
  222. }
  223. if (send(signal_send_fd[sig - 1], i, sizeof(*i),
  224. MSG_DONTWAIT|MSG_NOSIGNAL) == -1)
  225. trouble[sig - 1] = 1;
  226. }
  227. static void
  228. donothing(int fd, short event, void *arg)
  229. {
  230. /*
  231. * callback for our signal_recv_fd event structure
  232. * we don't want to act on these events, we just want to wake the poll()
  233. */
  234. };
  235. static void
  236. signotset(sigset_t *set)
  237. {
  238. int i, l;
  239. l = sizeof(*set) / 4;
  240. for (i = 0; i < l; i++) {
  241. ((unsigned *)set)[i] = ~((unsigned *)set)[i];
  242. }
  243. }
  244. /* The next three functions manage our private data about each event struct */
  245. static int
  246. grow_fdset(struct rtsigop *op, int newhigh)
  247. {
  248. /*
  249. * grow the fd -> rtdata array because we have encountered a
  250. * new fd too high to fit in the existing array
  251. */
  252. struct rtdata_list **p;
  253. struct rtdata_list *datset;
  254. int i,x;
  255. int newcnt = (newhigh + 1) << 1;
  256. if (newhigh <= op->highestfd)
  257. return (0);
  258. p = op->fdtodat;
  259. p = realloc(op->fdtodat, sizeof(struct rtdata_list *) * newcnt);
  260. if (p == NULL)
  261. return (-1);
  262. op->fdtodat = p;
  263. datset = calloc(newcnt - (op->highestfd + 1),
  264. sizeof(struct rtdata_list));
  265. if (datset == NULL)
  266. return (-1);
  267. for (i = op->highestfd + 1, x = 0; i < newcnt; i++, x++) {
  268. op->fdtodat[i] = &(datset[x]);
  269. TAILQ_INIT(op->fdtodat[i]);
  270. }
  271. op->highestfd = newcnt - 1;
  272. return (0);
  273. }
  274. static struct rtdata *
  275. ev2dat(struct rtsigop *op, struct event *ev, int create)
  276. {
  277. /*
  278. * given an event struct, find the dat structure that
  279. * corresponds to it if create is non-zero and the rtdata
  280. * structure does not exist, create it return NULL if not
  281. * found
  282. */
  283. int found = 0;
  284. int fd = ev->ev_fd;
  285. struct rtdata *ret = NULL;
  286. if (op->highestfd < fd && create)
  287. if (grow_fdset(op, fd) == -1)
  288. return (NULL);
  289. TAILQ_FOREACH(ret, op->fdtodat[fd], next) {
  290. if (ret->ev == ev) {
  291. found = 1;
  292. break;
  293. }
  294. }
  295. if (!found) {
  296. if (!create)
  297. return (NULL);
  298. ret = calloc(1, sizeof(struct rtdata));
  299. if (ret == NULL)
  300. return (NULL);
  301. ret->ev = ev;
  302. ret->poll_position = -1;
  303. TAILQ_INSERT_TAIL(op->fdtodat[fd], ret, next);
  304. }
  305. return (ret);
  306. }
  307. static void
  308. dat_del(struct rtsigop *op, struct rtdata *dat)
  309. {
  310. /*
  311. * delete our private notes about a given event struct
  312. * called from rtsig_del() only
  313. */
  314. int fd;
  315. if (dat == NULL)
  316. return;
  317. fd = dat->ev->ev_fd;
  318. TAILQ_REMOVE(op->fdtodat[fd], dat, next);
  319. memset(dat, 0, sizeof(*dat));
  320. free(dat);
  321. }
  322. static void
  323. set_sigaction(int sig)
  324. {
  325. /*
  326. * set the standard handler for any libevent-managed signal,
  327. * including the rtsig used for O_ASYNC notifications
  328. */
  329. struct sigaction act;
  330. act.sa_flags = SA_RESTART | SA_SIGINFO;
  331. sigfillset(&(act.sa_mask));
  332. act.sa_sigaction = &signal_handler;
  333. sigaction(sig, &act, NULL);
  334. }
  335. static int
  336. find_rt_signal()
  337. {
  338. /* find an unused rtsignal */
  339. struct sigaction act;
  340. int sig = SIGRTMIN;
  341. while (sig <= SIGRTMAX) {
  342. if (sigaction(sig, NULL, &act) != 0) {
  343. if (errno == EINTR)
  344. continue;
  345. } else {
  346. if (act.sa_flags & SA_SIGINFO) {
  347. if (act.sa_sigaction == NULL)
  348. return (sig);
  349. } else {
  350. if (act.sa_handler == SIG_DFL)
  351. return (sig);
  352. }
  353. }
  354. sig++;
  355. }
  356. return (0);
  357. }
  358. /*
  359. * the next three functions manage our pollset and the memory management for
  360. * fd -> rtdata -> event -> poll_position maps
  361. */
  362. static int
  363. poll_add(struct rtsigop *op, struct event *ev, struct rtdata *dat)
  364. {
  365. struct pollfd *pfd;
  366. int newmax = op->max << 1;
  367. int pp;
  368. if (op->poll == NULL)
  369. return (0);
  370. if (dat == NULL)
  371. dat = ev2dat(op, ev, 0);
  372. if (dat == NULL)
  373. return (0);
  374. pp = dat->poll_position;
  375. if (pp != -1) {
  376. pfd = &op->poll[pp];
  377. if (ev->ev_events & EV_READ)
  378. pfd->events |= POLLIN;
  379. if (ev->ev_events & EV_WRITE)
  380. pfd->events |= POLLOUT;
  381. return (0);
  382. }
  383. if (op->cur == op->max) {
  384. void *p = realloc(op->poll, sizeof(*op->poll) * newmax);
  385. if (p == NULL) {
  386. errno = ENOMEM;
  387. return (-1);
  388. }
  389. op->poll = p;
  390. p = realloc(op->ptodat, sizeof(*op->ptodat) * newmax);
  391. if (p == NULL) {
  392. /* shrink the pollset back down */
  393. op->poll = realloc(op->poll,
  394. sizeof(*op->poll) * op->max);
  395. errno = ENOMEM;
  396. return (-1);
  397. }
  398. op->ptodat = p;
  399. op->max = newmax;
  400. }
  401. pfd = &op->poll[op->cur];
  402. pfd->fd = ev->ev_fd;
  403. pfd->revents = 0;
  404. pfd->events = 0;
  405. if (ev->ev_events & EV_READ)
  406. pfd->events |= POLLIN;
  407. if (ev->ev_events & EV_WRITE)
  408. pfd->events |= POLLOUT;
  409. op->ptodat[op->cur] = dat;
  410. dat->poll_position = op->cur;
  411. op->cur++;
  412. return (0);
  413. }
  414. static void
  415. poll_free(struct rtsigop *op, int n)
  416. {
  417. if (op->poll == NULL)
  418. return;
  419. op->cur--;
  420. if (n < op->cur) {
  421. memcpy(&op->poll[n], &op->poll[op->cur], sizeof(*op->poll));
  422. op->ptodat[n] = op->ptodat[op->cur];
  423. op->ptodat[n]->poll_position = n;
  424. }
  425. /* less then half the max in use causes us to shrink */
  426. if (op->max > INIT_MAX && op->cur < op->max >> 1) {
  427. op->max >>= 1;
  428. op->poll = realloc(op->poll, sizeof(*op->poll) * op->max);
  429. op->ptodat = realloc(op->ptodat, sizeof(*op->ptodat) * op->max);
  430. }
  431. }
  432. static void
  433. poll_remove(struct rtsigop *op, struct event *ev, struct rtdata *dat)
  434. {
  435. int pp;
  436. if (dat == NULL)
  437. dat = ev2dat(op, ev, 0);
  438. if (dat == NULL) return;
  439. pp = dat->poll_position;
  440. if (pp != -1) {
  441. poll_free(op, pp);
  442. dat->poll_position = -1;
  443. }
  444. }
  445. static void
  446. activate(struct event *ev, int flags)
  447. {
  448. /* activate an event, possibly removing one-shot events */
  449. if (!(ev->ev_events & EV_PERSIST))
  450. event_del(ev);
  451. event_active(ev, flags, 1);
  452. }
  453. #define FD_CLOSEONEXEC(x) do { \
  454. if (fcntl(x, F_SETFD, 1) == -1) \
  455. event_warn("fcntl(%d, F_SETFD)", x); \
  456. } while (0)
  457. void *
  458. rtsig_init(void)
  459. {
  460. struct rtsigop *op;
  461. int sockets[2];
  462. int optarg;
  463. struct rtdata *dat;
  464. int flags;
  465. if (getenv("EVENT_NORTSIG"))
  466. goto err;
  467. op = calloc(1, sizeof(*op));
  468. if (op == NULL)
  469. goto err;
  470. op->max = INIT_MAX;
  471. op->poll = malloc(sizeof(*op->poll) * op->max);
  472. if (op->poll == NULL)
  473. goto err_free_op;
  474. op->signo = find_rt_signal();
  475. if (op->signo == 0)
  476. goto err_free_poll;
  477. op->nonsock = 0;
  478. if (socketpair(PF_UNIX, SOCK_DGRAM, 0, sockets) != 0)
  479. goto err_free_poll;
  480. FD_CLOSEONEXEC(sockets[0]);
  481. FD_CLOSEONEXEC(sockets[1]);
  482. signal_send_fd[op->signo - 1] = sockets[0];
  483. trouble[op->signo - 1] = 0;
  484. op->signal_send_fd = sockets[0];
  485. op->signal_recv_fd = sockets[1];
  486. flags = fcntl(op->signal_recv_fd, F_GETFL);
  487. fcntl(op->signal_recv_fd, F_SETFL, flags | O_NONBLOCK);
  488. optarg = MAXBUFFERSIZE;
  489. setsockopt(signal_send_fd[op->signo - 1],
  490. SOL_SOCKET, SO_SNDBUF,
  491. &optarg, sizeof(optarg));
  492. optarg = MAXBUFFERSIZE;
  493. setsockopt(op->signal_recv_fd,
  494. SOL_SOCKET, SO_RCVBUF,
  495. &optarg, sizeof(optarg));
  496. op->highestfd = -1;
  497. op->fdtodat = NULL;
  498. if (grow_fdset(op, 1) == -1)
  499. goto err_close_pair;
  500. op->ptodat = malloc(sizeof(*op->ptodat) * op->max);
  501. if (op->ptodat == NULL)
  502. goto err_close_pair;
  503. sigemptyset(&op->sigs);
  504. sigaddset(&op->sigs, SIGIO);
  505. sigaddset(&op->sigs, op->signo);
  506. sigprocmask(SIG_BLOCK, &op->sigs, NULL);
  507. set_sigaction(SIGIO);
  508. set_sigaction(op->signo);
  509. event_set(&(op->sigfdev), op->signal_recv_fd, EV_READ|EV_PERSIST,
  510. donothing, NULL);
  511. op->sigfdev.ev_flags |= EVLIST_DONTDEL;
  512. dat = ev2dat(op, &(op->sigfdev), 1);
  513. poll_add(op, &(op->sigfdev), dat);
  514. return (op);
  515. err_close_pair:
  516. close(op->signal_recv_fd);
  517. close(signal_send_fd[op->signo - 1]);
  518. err_free_poll:
  519. free(op->poll);
  520. err_free_op:
  521. free(op);
  522. err:
  523. return (NULL);
  524. }
  525. int
  526. rtsig_add(void *arg, struct event *ev)
  527. {
  528. struct rtsigop *op = (struct rtsigop *) arg;
  529. int flags, i;
  530. struct stat statbuf;
  531. struct rtdata *dat;
  532. if (ev->ev_events & EV_SIGNAL) {
  533. int signo = EVENT_SIGNAL(ev);
  534. sigaddset(&op->sigs, EVENT_SIGNAL(ev));
  535. if (sigprocmask(SIG_BLOCK, &op->sigs, NULL) == -1)
  536. return (-1);
  537. set_sigaction(signo);
  538. signal_send_fd[signo - 1] = op->signal_send_fd;
  539. trouble[signo - 1] = 0;
  540. return (0);
  541. }
  542. if (!(ev->ev_events & (EV_READ|EV_WRITE)))
  543. return (0);
  544. if (-1 == fstat(ev->ev_fd, &statbuf))
  545. return (-1);
  546. if (!S_ISSOCK(statbuf.st_mode))
  547. ev->ev_flags |= EVLIST_NONSOCK;
  548. flags = fcntl(ev->ev_fd, F_GETFL);
  549. if (flags == -1)
  550. return (-1);
  551. if (!(flags & O_ASYNC)) {
  552. if (fcntl(ev->ev_fd, F_SETSIG, op->signo) == -1 ||
  553. fcntl(ev->ev_fd, F_SETOWN, (int) gettid()) == -1)
  554. return (-1);
  555. /*
  556. * the overhead of always handling writeable edges
  557. * isn't going to be that bad...
  558. */
  559. if (fcntl(ev->ev_fd, F_SETFL, flags | O_ASYNC|O_RDWR))
  560. return (-1);
  561. }
  562. #ifdef O_ONESIGFD
  563. /*
  564. * F_SETAUXFL and O_ONESIGFD are defined in a non-standard
  565. * linux kernel patch to coalesce events for fds
  566. */
  567. fcntl(ev->ev_fd, F_SETAUXFL, O_ONESIGFD);
  568. #endif
  569. dat = ev2dat(op, ev, 1);
  570. if (dat == NULL)
  571. return (-1);
  572. op->total++;
  573. if (ev->ev_flags & EVLIST_NONSOCK)
  574. op->nonsock++;
  575. if (poll_add(op, ev, dat) == -1) {
  576. /* must check the level of new fd's */
  577. i = errno;
  578. fcntl(ev->ev_fd, F_SETFL, flags);
  579. errno = i;
  580. return (-1);
  581. }
  582. return (0);
  583. }
  584. int
  585. rtsig_del(void *arg, struct event *ev)
  586. {
  587. struct rtdata *dat;
  588. struct rtsigop *op = (struct rtsigop *) arg;
  589. if (ev->ev_events & EV_SIGNAL) {
  590. sigset_t sigs;
  591. sigdelset(&op->sigs, EVENT_SIGNAL(ev));
  592. sigemptyset(&sigs);
  593. sigaddset(&sigs, EVENT_SIGNAL(ev));
  594. return (sigprocmask(SIG_UNBLOCK, &sigs, NULL));
  595. }
  596. if (!(ev->ev_events & (EV_READ|EV_WRITE)))
  597. return (0);
  598. dat = ev2dat(op, ev, 0);
  599. poll_remove(op, ev, dat);
  600. dat_del(op, dat);
  601. op->total--;
  602. if (ev->ev_flags & EVLIST_NONSOCK)
  603. op->nonsock--;
  604. return (0);
  605. }
  606. int
  607. rtsig_recalc(struct event_base *base, void *arg, int max)
  608. {
  609. return (0);
  610. }
  611. /*
  612. * the following do_X functions implement the different stages of a single
  613. * eventloop pass: poll(), recv(sigsock), sigtimedwait()
  614. *
  615. * do_siginfo_dispatch() is a common factor to both do_sigwait() and
  616. * do_signals_from_socket().
  617. */
  618. static inline int
  619. do_poll(struct rtsigop *op, struct timespec *ts)
  620. {
  621. int res = 0;
  622. int i = 0;
  623. if (op->cur > 1) {
  624. /* non-empty poll set (modulo the signalfd) */
  625. if (op->nonsock) {
  626. int timeout = ts->tv_nsec / 1000000 + ts->tv_sec * 1000;
  627. sigprocmask(SIG_UNBLOCK, &(op->sigs), NULL);
  628. res = poll(op->poll, op->cur, timeout);
  629. sigprocmask(SIG_BLOCK, &(op->sigs), NULL);
  630. ts->tv_sec = 0;
  631. ts->tv_nsec = 0;
  632. } else {
  633. res = poll(op->poll, op->cur, 0);
  634. }
  635. if (res < 0) {
  636. return (errno == EINTR ? 0 : -1);
  637. } else if (res) {
  638. ts->tv_sec = 0;
  639. ts->tv_nsec = 0;
  640. }
  641. i = 0;
  642. while (i < op->cur) {
  643. struct rtdata *dat = op->ptodat[i];
  644. struct event *ev = dat->ev;
  645. if (op->poll[i].revents) {
  646. int flags = 0;
  647. if (op->poll[i].revents & (POLLIN | POLLERR))
  648. flags |= EV_READ;
  649. if (op->poll[i].revents & POLLOUT)
  650. flags |= EV_WRITE;
  651. if (!(ev->ev_events & EV_PERSIST)) {
  652. poll_remove(op, ev, op->ptodat[i]);
  653. event_del(ev);
  654. } else {
  655. i++;
  656. }
  657. event_active(ev, flags, 1);
  658. } else {
  659. if (ev->ev_flags & (EVLIST_NONSOCK|EVLIST_DONTDEL)) {
  660. i++;
  661. } else {
  662. poll_remove(op, ev, op->ptodat[i]);
  663. }
  664. }
  665. }
  666. }
  667. return (res);
  668. }
  669. static inline int
  670. do_siginfo_dispatch(struct event_base *base, struct rtsigop *op,
  671. siginfo_t *info)
  672. {
  673. int signum;
  674. struct rtdata *dat, *next_dat;
  675. struct event *ev, *next_ev;
  676. if (info == NULL)
  677. return (-1);
  678. signum = info->si_signo;
  679. if (signum == op->signo) {
  680. int flags, sigok = 0;
  681. flags = 0;
  682. if (info->si_band & (POLLIN|POLLERR))
  683. flags |= EV_READ;
  684. if (info->si_band & POLLOUT)
  685. flags |= EV_WRITE;
  686. if (!flags)
  687. return (0);
  688. if (info->si_fd > op->highestfd)
  689. return (-1);
  690. dat = TAILQ_FIRST(op->fdtodat[info->si_fd]);
  691. while (dat != TAILQ_END(op->fdtodat[info->si_fd])) {
  692. next_dat = TAILQ_NEXT(dat, next);
  693. if (flags & dat->ev->ev_events) {
  694. ev = dat->ev;
  695. poll_add(op, ev, dat);
  696. activate(ev, flags & ev->ev_events);
  697. sigok = 1;
  698. }
  699. dat = next_dat;
  700. }
  701. } else if (signum == SIGIO) {
  702. TAILQ_FOREACH(ev, &base->eventqueue, ev_next) {
  703. if (ev->ev_events & (EV_READ|EV_WRITE))
  704. poll_add(op, ev, NULL);
  705. }
  706. return (1); /* 1 means the caller should poll() again */
  707. } else if (sigismember(&op->sigs, signum)) {
  708. /* managed signals are queued */
  709. ev = TAILQ_FIRST(&signalqueue);
  710. while (ev != TAILQ_END(&signalqueue)) {
  711. next_ev = TAILQ_NEXT(ev, ev_signal_next);
  712. if (EVENT_SIGNAL(ev) == signum)
  713. activate(ev, EV_SIGNAL);
  714. ev = next_ev;
  715. }
  716. } else {
  717. /* dispatch unmanaged signals immediately */
  718. struct sigaction sa;
  719. if (sigaction(signum, NULL, &sa) == 0) {
  720. if ((sa.sa_flags & SA_SIGINFO) && sa.sa_sigaction) {
  721. (*sa.sa_sigaction)(signum, info, NULL);
  722. } else if (sa.sa_handler) {
  723. if ((int)sa.sa_handler != 1)
  724. (*sa.sa_handler)(signum);
  725. } else {
  726. if (signum != SIGCHLD) {
  727. /* non-blocked SIG_DFL */
  728. kill(gettid(), signum);
  729. }
  730. }
  731. }
  732. }
  733. return (0);
  734. }
  735. /*
  736. * return 1 if we should poll again
  737. * return 0 if we are all set
  738. * return -1 on error
  739. */
  740. static inline int
  741. do_sigwait(struct event_base *base, struct rtsigop *op, struct timespec *ts,
  742. sigset_t *sigs)
  743. {
  744. for (;;) {
  745. siginfo_t info;
  746. int signum;
  747. signum = sigtimedwait(sigs, &info, ts);
  748. ts->tv_sec = 0;
  749. ts->tv_nsec = 0;
  750. if (signum == -1) {
  751. if (errno == EAGAIN || errno == EINTR)
  752. return (0);
  753. return (-1);
  754. } else if (1 == do_siginfo_dispatch(base, op, &info)) {
  755. return (1);
  756. }
  757. }
  758. /* NOTREACHED */
  759. }
  760. static inline int
  761. do_signals_from_socket(struct event_base *base, struct rtsigop *op,
  762. struct timespec *ts)
  763. {
  764. int fd = op->signal_recv_fd;
  765. siginfo_t info;
  766. int res;
  767. for (;;) {
  768. res = recv(fd, &info, sizeof(info), MSG_NOSIGNAL);
  769. if (res == -1) {
  770. if (errno == EAGAIN)
  771. return (0);
  772. if (errno == EINTR)
  773. continue;
  774. return (-1);
  775. } else {
  776. ts->tv_sec = 0;
  777. ts->tv_nsec = 0;
  778. if (1 == do_siginfo_dispatch(base, op, &info))
  779. return (1);
  780. }
  781. }
  782. /* NOTREACHED */
  783. }
  784. int
  785. rtsig_dispatch(struct event_base *base, void *arg, struct timeval *tv)
  786. {
  787. struct rtsigop *op = (struct rtsigop *) arg;
  788. struct timespec ts;
  789. int res;
  790. sigset_t sigs;
  791. ts.tv_sec = tv->tv_sec;
  792. ts.tv_nsec = tv->tv_usec * 1000;
  793. poll_for_level:
  794. res = do_poll(op, &ts); /* ts can be modified in do_XXX() */
  795. res = do_signals_from_socket(base, op, &ts);
  796. if (res == 1)
  797. goto poll_for_level;
  798. else if (res == -1)
  799. return (-1);
  800. /*
  801. * the mask = managed_signals | unblocked-signals
  802. * MM - if this is not blocking do we need to cast the net this wide?
  803. */
  804. sigemptyset(&sigs);
  805. sigprocmask(SIG_BLOCK, &sigs, &sigs);
  806. signotset(&sigs);
  807. sigorset(&sigs, &sigs, &op->sigs);
  808. res = do_sigwait(base, op, &ts, &sigs);
  809. if (res == 1)
  810. goto poll_for_level;
  811. else if (res == -1)
  812. return (-1);
  813. return (0);
  814. }