Very Secure

TheFleet - A Systematic Exploration of the IRC Space

TheFleet is a set of bots that will sail across the digital ocean and dock at the ports of the 260 or so irc networks listed on netsplit.de. The goal of the project is to do a systematic exploration of what is there. The first step is to collect the hostname, port, and list of all active channels for each irc network. The next step is to make a script that parks a logging bot in each found channel. There needs to be a separate script making sure the fleet of bots remain connected. TheFleet must have an interface for viewing the data collected.

To obtain a list of all channels for all networks I wrote a script that scrapes information off netsplit.de. This morning diana_coman informed me of a simpler method to get the channel list for each network with the command /list. I still need a way to get the list of channels from /list as a structured piece of data. My heathen client Limechat just opens up a popup that I can scroll through. I will look into seeing if I can use the backbone of logbot, :cl-irc, to create a script that grabs all the channels from every irc network using the /list command. I will format my list of all networks with all their channels into an sexpr structured as a list of plists, where each plist looks like1 ~

(:network "freenode" :host "irc.freenode.net"
:port 6667 :channels ("#ossasepia" "#trilema" "#lisp"))

Once I have the networks mapped to their channels structured as a list of plists, the next step is to iterate through the plists and park a logging bot in every channel. I plan to use a modification of trinque's logbot pressed to ben_vulpes's logbot-multiple-channels-corrected vpatch. The first modification is to change the db schema to include a column, "network", for the log table.2 Then I need to create a function connect-fleet. connect-fleet iterates through all the networks, and for each network divides its channels into lists the length of the network's max number of channels allowed per nick. For every channel group, a new thread is created with a bot connected to all channels in the group. The code is sketched out below. make-fleetbot and fleetbot-connect-thread should be small modifications to their corresponding make-logbot and logbot-connect-thread. max-channels-per-nick may wind up a hard coded lookup table for the ~260 or so irc networks.

(defun connect-fleet (networks)
  "Logs every channel in each irc network in networks"
  (mapcar #'connect-to-all-channels networks))

(defun connect-to-all-channels (network)
  "Connects a logging bot to all channels in the network, each bot runs in its own thread and connects to as many channels as possible."
  (mapcar
   (lambda (channel-group) (connect-bot-to-channel-group network channel-group))
   (get-network-channel-groups network)))

(defun connect-bot-to-channel-group (network channel-group)
  "Connects a bot to the channels in channel-group from network network and returns a plist containing the bot and the connection thread."
  (let ( (bot (make-fleetbot :host (getf network :host) :port (getf network :port) :channels channel-group)))
    (list :bot bot :thread (fleetbot-connect-thread bot))))

(defun get-network-channel-groups (network)
    "Takes a network and returns a list of a list of channels. The channels are grouped into lists the length of the max number of channels allowed per nick "
    (make-groups (getf network :channels) (max-channels-per-nick network)))

(defun make-groups (lst group-size)
    "Breaks up lists into length group-size sublists, if the length of the list is not a multiple of group-size the last sublist will have a length of (mod (length lst) group-size)"
    (cond ( (null lst) nil)
          ( (<  (length lst) group-size) (list lst))
          (t (append (list (subseq lst 0 group-size)) (make-groups (subseq lst group-size) group-size)))))

(defun max-channels-per-nick (network)
  "TODO: Create a way to find out the max channels allowed per nick for every network. The limit for freenode is 120."
  120)

(defun make-fleetbot (&key host port channels)
  "TODO: Implement")

(defun fleetbot-connect-thread (bot)
  "TODO: Implement")

My plan is to create 1 process for each irc network and then have each process create num-channels / max-channels-per-nick subthreads. Each subthread has a logbot connected to max-channels-per-nick channels.3 One question is whether or not there is a limit I will surpass of total-processes * total-subthreads allowed per VM. There should be about 260 processes each with a maximum of roughly 550 subthreads.4

The next step is to create a process on another server that makes sure that TheFleet is up and running, resurfacing ships whenever they sink. We can monitor the processes for each irc network, restarting them if they get killed. It will be more tricky to find partial deaths. We need to make sure that each subthread within each alive process is running with a connected bot. That is where the swank server may come in handy, each process should allow for another process to connect and issue commands in its sbcl environment. The script that is checking the individual subthreads can run on the same VM as TheFleet.

Once I have a fleet of bots connected I plan to make a simple web interface to visualize the db. The homepage will be a list of networks where each network links to a long unpaginated list of all channels sorted by activity.5 Channels can be clicked to see their logs displayed in the same daily format as i.e. logs.ossasepia.com.

To be continued.

  1. It may be worth including the key :linknames which points to a list of all the servers that comprise the network. We may also want to keep the channel list constantly updated as TheFleet is running, connecting to new channels as they appear for the first time in subsequent calls to /list. []
  2. Currently the logstable has the columns id, target, message, host, source, user, received_at:

                      id                  |  target   | message |        host         | source |  user   |        received_at
    --------------------------------------+-----------+---------+---------------------+--------+---------+----------------------------
     1a1aaa4e-41a0-4971-96e3-fe5d62d63649 | #whaacked | yo      | unaffiliated/whaack | whaack | ~whaack | 2019-12-24 19:24:04.979817

    The column names (which I believe are chosen to match :cl-irc's naming system) are a bit confusing, for example target is the word used for channel. I am not sure of the distinction between the columns "source" and "user", both of which imo should be "nick". []

  3. This means that I may move the logic of the connect-fleet function to a bash script. I don't know how to create a different top level process from within sbcl and I don't think it is possible unless I import a library to issue system commands. []
  4. Only freenode should take 550 or so bots to connect to all the channels. Most of the networks should require between 1-30 bots and consequently 1-30 subthreads. []
  5. The question of how to figure out which channel's logs to read will need some pondering. []

3 Responses to “TheFleet - A Systematic Exploration of the IRC Space”

  1. whaack says:

    It may be easier to read the source code here

    http://paste.deedbot.org/?id=Sh9t

    as I figure out how to make long widthed lines of code display nicely on my blog.

  2. Diana Coman says:

    "TheFleet must have an interface for viewing the data collected." - why? I don't think there's really any need for that, nobody is going to *want* to sink the time to fully read all that spew and moreover, most of it will be join/parts and the like. Don't confuse the run of the mill chan with republican ones really, there's so little similarity that they are for all intents and purposes different entities entirely. Did you ever join & check out of curiosity some "active" chans? Because I did and it's such sadness you can't imagine. Iirc MP also documented a few attempts, not like it's news.

    Just let the logs be as text files, they are inputs for automated processes not as much conversation, no. Basically you'll get with this small project a practical introduction to data/text mining if you hadn't had any so far. Sure, look through them to get some idea what is in there but don't waste time on web interface. You'll need to make instead some scripts for initial data cleaning and basic stats. Once that is done, we see what it says and therefore what else makes most sense as next steps. For scripting awk and/or R will probably do great and you should gain some familiarity with both anyway.

  3. [...] 2. TheFleet - A Systematic Exploration of the IRC Space [...]

Leave a Reply