Monday, April 22, 2013

Making TDD Fun with Auto-running tests

I attended a great talk by Toby Ho (@airportyh) this weekend at CoderFaire Atlanta about making TDD fun. He's written a nifty little terminal app called testem that auto-runs your tests whenever you save code files. He does his coding split-screen, and every time he saves a source file the tests instantly re-run.

 

I won't get into the details of his talk, but will say that I've found it a much more fun and productive way to do TDD. It catches a lot of errors quickly in both tests and underlying code, and makes it much less of a chore to do TDD or write tests in general. Here's how to get it working for PHP:
# put this in your testem.json; adjust paths as appropriate
{
    "src_files": [
        "classes/**/*.php",
        "test/**/*.php"
    ],
    "launchers": {
        "phpunit": {
            "command": "phpunit --tap ${PHPUNIT_ARGS} ",
            "protocol": "tap"
        }
    }
}

# Configure PHPUNIT_ARGS to only look at tests you're working on so that running tests will be fast
$ export PHPUNIT_ARGS="--filter testNamedFoo"

# run testem
$ testem -l phpunit
If you aren't seeing errors in the testem window, it's probably because you've got PHP configured to print errors to STDOUT and testem expects them on STDERR (when using TAP format). Just check your display_errors setting and adjust to "STDERR" as needed.

Monday, February 20, 2012

Debugging Server Performance / Quick & Dirty Server Performance Monitoring

Trying to figure out what's causing server performance issues is really complicated, especially if you're having flaky performance issues. We've been debugging this type of situation lately and I wanted to share a few tricks we used to try to figure out what's going on.

One thing that's really important is having granular information logged so that you can look back after an incident occurs to try to figure out what led to the issue. Sometimes it's easy and it's just one thing; however, many times there are multiple minor issues that lead to an aggregate major one.

While it's easy to see major issues like load average with tools like munin, it's almost impossible to tell from munin graphs what the server is actually doing when it's loaded.

The data that you really need to be able to debug this situation further are highly granular snapshots of the entire machine taken at ~5s increments. This will allow you to see the load as it builds up and try to decompose the various factor(s) that are contributing to the poor performance.

To achieve this, we add some transient logging when we're experiencing issues to help us find the culprits. Basically we want to sample the server every 5 seconds so we can see what's using CPU, what's using IO, and look for other outliers. All of these tools are easy to install on CentOS.

This dstat call records some generic information about the most costly thing going on, along with overall load averages.

$ dstat --time --cpu --net --load --top-cpu-adv --top-io-adv --nocolor 5 > dstat.log &

----system---- ----total-cpu-usage---- -net/total- ---load-avg--- -------most-expensive-cpu-process------- -------most-expensive-i/o-process-------
         time |usr sys idl wai hiq siq| recv send| 1m 5m 15m |process pid cpu read write|process pid read write cpu
20-02 10:56:27| 20 4 74 1 0 0| 0 0 |0.53 0.84 0.82|monit 217571.3% 594k 8B|init [3] 1 1175k 316k0.0%
20-02 10:56:32| 18 5 76 1 0 1| 385k 334k|0.65 0.86 0.82|httpd 162292.9% 16k3937B|httpd 284711227k1040k 0%
20-02 10:56:37| 8 2 90 1 0 0| 119k 213k|0.60 0.85 0.82|httpd 155880.8%5902B3256B|httpd 28471 220k 195k 0%
20-02 10:56:42| 16 3 81 0 0 0| 154k 218k|0.55 0.83 0.81|convert 162934.1% 675k 0 |httpd 28471 528k 401k 0%
20-02 10:56:47| 91 4 5 0 0 1| 148k 581k|0.99 0.92 0.84|convert 16293 74%1148k 0 |convert 162931148k 0 74%

This iotop records all processes doing IO along with their exact IO usage:

iotop --processes --only --time --delay 5 > iotop.log &

Total DISK READ: 15.89 K/s | Total DISK WRITE: 285.27 K/s     TIME  PID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN      IO    COMMAND 13:23:51     1 be/4 root        0.00 B/s    0.00 B/s 99.99 % 99.99 % init [3] 13:23:51     4 rt/3 root        0.00 B/s    0.00 B/s -2.46 % 99.99 % [watchdog/0]13:23:51     3 be/7 root        0.00 B/s    0.00 B/s 99.99 % 99.99 % [ksoftirqd/0]13:23:51 25146 be/4 apache      7.95 K/s    2.38 K/s  0.00 %  2.32 % httpd 13:23:51  1870 be/4 root        0.00 B/s    0.00 B/s  0.00 %  1.35 % mingetty tty2 13:23:51 25244 be/5 root        0.00 B/s  813.68 B/s 99.99 %  1.35 % python /usr/bin/iotop --processes --only --time --batch --delay 5
This sh/ps script records the top 5 CPU consumers every 5 seconds, including the entire command being run.

while true; do date && ps aux --sort -pcpu | head -6 &&  sleep 5; done > top.log &




Mon Feb 20 13:14:57 CST 2012
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
apache   22285  2.3  1.1  71404 20920 ?        S    13:14   0:00 /usr/sbin/httpd
501      22279  1.9  0.6  23688 11124 ?        RN   13:14   0:00 /usr/bin/php refreshStats.php
root     16258  0.9  0.3  11660  5700 pts/3    SN   10:56   1:17 /usr/bin/python /usr/bin/iotop --processes --only --batch --delay 5
apache   22301  0.7  1.0  71456 18992 ?        S    13:14   0:00 /usr/sbin/httpd
apache   22029  0.8  1.0  71176 19364 ?        S    13:14   0:00 /usr/sbin/httpd
With this logging in place, which btw does consume non-trivial additional resources on the box, we will be able to go back after a performance incident occurs and see with high levels of time granularity what was going on with the box during the low performance situation.

Wednesday, December 28, 2011

Cheap, easy way to make your dev server available on the public internet.


While solutions like localtunnel and showoff.io allow you to do this, they have some limitations both in terms of cost and functionality. The biggest problem is that it gets expensive for lots of developers or you can't use your own hostnames.

We've developed a alternative using DNS, a reverse proxy and an ssh tunnel that makes it trivial to allow public access to a any number of dev servers on demand.

Here's how it works:
  • Set up a reverse proxy (you can use Apache or nginx).
# assume public IP of 1.2.3.4 # need name-based virtual hosting # so we can support many dev boxes with a single IP NameVirtualHost 1.2.3.4:80 # Need a VirtualHost container for each developer <VirtualHost 1.2.3.4:80> ServerName jason.dev.domain.com ProxyRequests Off ProxyPreserveHost On ProxyPass / http://localhost:2000/ retry=0 ProxyPassReverse / http://localhost:2000/ </VirtualHost> <VirtualHost 1.2.3.4:80> ServerName tim.dev.domain.com ProxyRequests Off ProxyPreserveHost On ProxyPass / http://localhost:2001/ retry=0 ProxyPassReverse / http://localhost:2001/ </VirtualHost>
  • Configure a wildcard CNAME record for *.dev.domain.com that points to your proxy server. Using the wildcard avoids having to munge DNS for every new developer.
  • Set up a proxy user account on the box and add all developers' ssh keys to the account. All this user needs to do is to log in and forward non-privileged ports, so it can be locked down substantially.
  • Edit your /etc/hosts to so that the canonical name for your server points to your local dev IP.
# /etc/hosts entry 33.33.33.11 jason.dev.domain.com
  • To make your dev server publicly available, create an SSH tunnel. Remember that each developer will have a particular remote port number assigned to them and them only.
ssh proxy@proxy.dev.domain.com -R 2000:jason.dev.domain.com:80
  • This setup allows the exact same host name to be used everywhere but have it hit the local dev box locally and have the same name resolve to the public proxy for development that require it.
While this does require you to have a publicly-reachable server somewhere to configure the proxy, this probably isn't a huge problem for most companies. In return you get a near foolproof setup for debugging webhooks, mobile apps, etc, at no cost, and without jumping through any hoops or relying on any third-party systems.

Monday, November 28, 2011

Time Machine backups to a drive on another Mac over the network

I've got a huge external drive on my desktop Mac which I use for a Time Machine backup. However, my MacBook has gone un-backed up for a long time. I had heard over the years that you could hack your way to a network Time Machine backup, but never bothered trying (I try to avoid hacks because they usually blow up in my face and takes a ton of time to fix). However, now that I'm using Aperture heavily on my laptop, I don't want to go without a backup of all of my pics.

Turns out, it's very, very easy to set up an external drive on one Mac as a Time Machine volume over the network.
  1. Share the external hard drive. To do this, do a Get Info on the external drive and click the Shared Folder checkbox.
  2. Open up System Preferences > Sharing.
    1. Make sure File Sharing is enabled.
    2. You should see the external hard drive listed in the Shared Folders column. Click it.
    3. Click the + icon in the Users column and add the user you connect with from the networked computer.
    4. Set up Read & Write access for that user.
    5. Ideally you should remove/lower privileges for other users here. In my case the shared external drive had all kinds of unreasonable default permissions for Unknown User, Guest, and Everyone.
  3. On the networked machine, go to the terminal and enter:
    defaults write com.apple.systempreferences TMShowUnsupportedNetworkVolumes 1
  4. Use the Finder to connect to the machine hosting the external drive with the user you configured in step #3 above.
  5. Open System Preferences > Time Machine.
  6. Click Select Disk.
  7. You should see the external drive listed, pick it.
  8. Profit!
So next you're probably wondering, dang this initial backup is going to take forever. Well, there's a trick for that, too. Apple File Sharing works over a local non-routed network (ie Bonjour). So just hook up your two computers with an ethernet cable, and disable WiFi on the machine that doesn't have the external drive attached. You should still be able to see the shared external drive in the Finder, only now it's running on Gigabit Ethernet (or the fastest your two machines can manage with each other). This is a great way to do the initial backup, and subsequent backups will be much smaller and not as big of a deal.

Enjoy!

Saturday, November 19, 2011

Mouse support for Terminal.app: scrolling, vi/vim, and more!

Goofing around on the internet today just got real. I was reading about ncurses and noticed that the API supports mouse events. So wait, if terminals can support mouse events, then why doesn't vim in Terminal.app work like gVim and support mouse scrolling, clicking, and selection? It can, and it's AMAZING!

Someone wrote a SIMBL plugin called MouseTerm that passes through all mouse events to the terminal. After that, all you have to do is enable mouse support in vim, and Boom goes the Dynamite!

Steps:
1. Install SIMBL
2. Install MouseTerm
3. Edit your .vimrc:

" mouse support
if has("mouse")
    set mouse=a
    set ttymouse=xterm2
endif
4. That's it! Now you can use the mouse for clicking, selection, and scrolling!

One thing I did notice is that you can no longer copy text to the Mac clipboard from Vim in this mode. It's easy to toggle mouse event passing with Cmd-Shift-M (or menu item Shell > Send Mouse Events).

I used the ttymouse=xterm2 since it is the one recommended for use with gnu screen. ttymouse=xterm didn't behave properly for me when vim was used inside of screen.


Thanks to Ayaz for this article on "Using mouse inside Vim on Terminal.app" which got me pointed in the right direction.

Monday, November 07, 2011

Security Caveats with S3: it's easy to grant dangerous permissions with Bucket and User policies.


We've been exploring ways to use S3, and one of the ideas was to dynamically create buckets on-the-fly for individual customers to store and manage their objects.

Since we take security very seriously, we took a deep-dive into S3's permissions model to be sure that we could eliminate the risk of data loss due to application logic error and/or security breach of the web server.

As with any security setup, partitioning of access rights is key to risk mitigation. Thus we created an unprivileged webapp user that could not permanently delete S3 objects. Of course, we were trying to create buckets on-the-fly, so this same user would need the ability to create buckets and set bucket policies.

As it turns out, it's not possible to do this securely. This was a big shock to me, since what it means in practice is that UserA can grant rights to UserB even if UserA doesn't have those rights himself.

Steps:
1. Bucket Owner grants *only* PutBucketPolicy to UserA
2. UserA grants DeleteObject / DeleteObjectVersion to UserB.
3. UserB can now permanently delete every object in the bucket.

Even more shocking, this is also true if you've gone the extra step of enabling MFA-delete on the bucket. This means that you need an MFA device to successfully permanently delete objects. However, you don't have to have the MFA device originally used to enable MFA-delete. In the above scenario, even with MFA-delete enabled, UserB can buy his own MFA device, attach it to his account, and permanently delete any object in the bucket.

Furthermore, there is no place to attach deny rules that could not be similarly overturned, since the only place to attach such a rule is via a Bucket Policy, and of course that deny can be easily overwritten by the webapp user with the PutBucketPolicy permissions.

The bottom line is that giving PutBucketPolicy to a user is equivalent to giving them root access to the bucket. They can do anything. Thus, the AWS/S3 permission model is not currently suitable if you'd like your web application to be able to create and configure buckets on-the-fly.

I do not have enough of a security background to know if this is considered bad security architecture or just a normal caveat of complex permissions models, but I was personally very concerned that a user could grant permission he didn't have himself.

All of this has been confirmed through multiple experiments as well as with AWS Support. It is true. So be careful!

Saturday, April 09, 2011

Pearfarm: making it trivially easy to create and share php packages with PEAR

Jason Ardell and I had the honor of speaking at this month's Atlanta PHP user group. We gave a talk about the Pearfarm project, which I started in 2009 with a handful of other amazing PHP developers that were interested in bringing some community collaboration tools found in other language communities to the PHP universe.

The result was Pearfarm, which has 2 major features:
  1. Make it really easy to create PEAR packages
  2. Make it really easy to share php packages
That's it. Amazingly, this is not easy to do in the PHP community, even today, almost 2 years after we built Pearfarm. Yes, pyrus makes it easier, but there are caveats, like having to arrange your code in a certain directory structure. And even though they've made a PEAR channel server that's easy to run, well you still have to learn about PEAR channels and find a place to host a server with your PEAR channel. Seems like a lot of work to share a small package.

Pearfarm has been out for 18 months now, and while it is awesome at what it does and some people have started using it, it hasn't taken off. Mostly, I think, because we haven't pushed it very hard. But I think that we (the PHP community) should be pushing it harder. After all, it's for our own good.





One thing I learned from talking about pearfarm over the last couple of years that really shocked me is is how few people are using PEAR at all. If you ask php devs how they find packages the answer is google. It's not even the PEAR repository! To the average php developer on the street, PEAR is a dead project. I think largely this is due to the perception that the PEAR repository is so devoid of activity.

Many people have argued to me that having the PEAR repo is a good thing for the community since it provides packages that have been vetted at some level. I can accept that there is value in having that trust, but it comes at the expense of a rich, active developer ecosystem. It's too hard to participate in PEAR as a developer. I know the PEAR people will argue with me that I'm wrong and it's so easy, but I don't agree. It's hard to participate in PEAR as a contributor.

However, PEAR (the installer) is actually great! It makes it really easy to create local sandboxes of code libraries for each application. It handles dependencies. It has the tools built-in to create PEAR packages (though it's not trivial unless you use pearfarm or pyrus).

So I think we all owe it to the community to start using other people's code, and start sharing code. Pearfarm makes it seriously easy to share php packages. In no time at all we could completely re-invigorate the PHP community. Some of the bigger PHP projects are already starting to spin off a lot of quality php code. This is good, this is very good. EVERYONE INTO THE POOL!