Blog Directory : Listing Details
Coding Horror details
Listing ID: 394
Title: Coding Horror
Description: Programming and human factors by Jeff Atwood.
Category: Computers : Programming
Owner:
listed on: May 04, 2008 10:35:09 PM
Number Hits: 2 times
Recent Posts:
| We Are Typists First, Programmers Second - Mon, 17 Nov 2008 23:59:59 -0800 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Remember last week when I said coding was just writing? I was wrong. As one commenter noted, it's even simpler than that. [This] reminds me of a true "Dilbert moment" a few years ago, when my (obviously non-technical) boss commented that he never understood why it took months to develop software. "After all", he said, "it's just typing." Like broken clocks, even pointy-haired managers are right once a day. Coding is just typing.
So if you want to become a great programmer, start by becoming a great typist. Just ask Steve Yegge. I can't understand why professional programmers out there allow themselves to have a career without teaching themselves to type. It doesn't make any sense. It's like being, I dunno, an actor without knowing how to put your clothes on. It's showing up to the game unprepared. It's coming to a meeting without your slides. Going to class without your homework. Swimming in the Olympics wearing a pair of Eddie Bauer Adventurer Shorts. I had a brief email exchange with Steve back in March 2007, after I wrote Put Down The Mouse, where he laid that very same Reservoir Dogs quote on me. Steve's followup blog post was a very long time in coming. I hope Steve doesn't mind, but I'd like to pull two choice quotes directly from his email responses: I was trying to figure out which is the most important computer science course a CS student could ever take, and eventually realized it's Typing 101. Strong statements indeed. I concur. We are typists first, and programmers second. It's very difficult for me to take another programmer seriously when I see them using the hunt and peck typing techniques. Like Steve, I've seen this far too often. First, a bit of honesty is in order. Unlike Steve, I am a completely self-taught typist. I didn't take any typing classes in high school. Before I wrote this blog post, I realized I should check to make sure I'm not a total hypocrite. So I went to the first search result for typing test and gave it a shot. I am by no means the world's fastest typist, though I do play a mean game of Typing of the Dead. Let me emphasize that this isn't a typing contest. I just wanted to make sure I wasn't full of crap before I posted this. I know, there's a first time for everything. Maybe this'll be the start of a trend. Doubtful, but you never know. Steve and I believe there is nothing more fundamental in programming than the ability to efficiently express yourself through typing. Note that I said "efficiently" not "perfectly". This is about reasonable competency at a core programming discipline. Maybe you're not convinced that typing is a core programming discipline. I don't blame you, although I do reserve the right to wonder how you manage to program without using your keyboard. Instead of answering directly, let me share one of my (many) personal foibles with you. At least four times a day, I walk into a room having no idea why I entered that room. I mean no idea whatsoever. It's as if I have somehow been teleported into that room by an alien civilization. Sadly, the truth is much less thrilling. Here's what happened: in the brief time it took for me to get up and move from point A to point B, I have totally forgetten whatever it was that motivated me to get up at all. Oh sure, I'll rack my brain for a bit, trying to remember what I needed to do in that room. Sometimes I remember, sometimes I don't. In the end, I usually end up making multiple trips back and forth, remembering something else I should have done while I was in that room after I've already left it. It's all quite sad. Hopefully your brain has a more efficient task stack than mine. But I don't fault my brain -- I fault my body. It can't keep up. If I had arrived faster, I wouldn't have had time to forget. What I'm trying to say is this: speed matters. When you're a fast, efficient typist, you spend less time between thinking that thought and expressing it in code. Which means, if you're me at least, that you might actually get some of your ideas committed to screen before you completely lose your train of thought. Again. Yes, you should think about what you're doing, obviously. Don't just type random gibberish as fast as you can on the screen, unless you're a Perl programmer. But all other things being equal -- and they never are -- the touch typist will have an advantage. The best way to become a touch typist is through typing, and lots of it. A little research and structured practice couldn't hurt either. Here are some links that might be of interest to the aspiring touch typist:
(But this is a meager and incomplete list. What tools do you recommend for becoming a better typist?) There's precious little a programmer can do without touching the keyboard; it is the primary tool of our trade. I believe in practicing the fundamentals, and typing skills are as fundamental as it gets for programmers. Hail to the typists!
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Your Favorite NP-Complete Cheat - Sat, 15 Nov 2008 05:05:36 -0800 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Have you ever heard a software engineer refer to a problem as "NP-complete"? That's fancy computer science jargon shorthand for "incredibly hard": The most notable characteristic of NP-complete problems is that no fast solution to them is known; that is, the time required to solve the problem using any currently known algorithm increases very quickly as the size of the problem grows. As a result, the time required to solve even moderately large versions of many of these problems easily reaches into the billions or trillions of years, using any amount of computing power available today. As a consequence, determining whether or not it is possible to solve these problems quickly is one of the principal unsolved problems in Computer Science today. You do want to be an expert programmer, don't you? Of course you do!
(Update: I was shooting for a poetic allusion to the P=NP problem here but based on the comments this is confusing and arguably incorrect. So I'll redact this sentence. Instead, I point you to this P=NP poll (pdf); read the comments from CS professors (including Knuth) to get an idea of how realistic this might be.) Instead, I'll recommend a book Anthony Scian recommended to me: Computers and Intractability: A Guide to the Theory of NP-Completeness. Like all the software engineering books I recommend, this book has a timeless quality. It was originally published in 1979, a shining testament to smart people attacking truly difficult problems in computer science: "I can't find an efficient algorithm, but neither can all these famous people." So how many problems are NP-complete? Lots. Even if you're a layman, you might have experienced NP-Completeness in the form of Minesweeper, as Ian Stewart explains. But for programmers, I'd argue the most well known NP-completeness problem is the travelling salesman problem. Given a number of cities and the costs of travelling from any city to any other city, what is the least-cost round-trip route that visits each city exactly once and then returns to the starting city? The brute-force solution -- trying every possible permutation between the cities -- might work for a very small network of cities, but this quickly becomes untenable. Even if we were to use theoretical CPUs our children might own, or our children's children. What's worse, every other algorithm we come up with to find an optimal path for the salesman has the same problem. That's the common characteristic of NP-complete problems: they are exercises in heuristics and approximation, as illustrated by this xkcd cartoon: What do expert programmers do when faced by an intractable problem? They cheat. And so should you! Indeed, some of the modern approximations for the Travelling Salesman Problem are remarkably effective. Various approximation algorithms, which quickly yield good solutions with high probability, have been devised. Modern methods can find solutions for extremely large problems (millions of cities) within a reasonable time, with a high probability of being just 2-3% away from the optimal solution. Unfortunately, not all NP-complete problems have good approximations. But for those that do, I have to wonder: if we can get so close to an optimal solution by cheating, does it really matter if there's no known algorithm to produce the optimal solution? If I've learned nothing else from NP-complete problems, I've learned this: sometimes coming up with clever cheats can be more interesting than searching in vain for the perfect solution. Consider the First Fit Decreasing algorithm for the NP-complete Bin Packing problem . It's not perfect, but it's incredibly simple and fast. The algorithm is so simple, in fact, it is regularly demonstrated at time management seminars. Oh, and it guarantees that you will get within 22% of the perfect solution every time. Not bad for a lousy cheat. So what's your favorite NP-complete cheat?
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Stop Me If You Think You've Seen This Word Before - Wed, 12 Nov 2008 23:59:59 -0800 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
If you've ever searched for anything, you've probably run into stop words. Stop words are words so common they are typically ignored for search purposes. That is, if you type in a stop word as one of your search terms, the search engine will ignore that word (if it can). If you attempt to search using nothing but stop words, the search engine will throw up its hands and tell you to try again. Seems straightforward enough. But there can be issues with stop words. Imagine, for example, you wanted to search for information on this band. "The" is one of the most common words in the English language, so a naive search for "The The" rarely ends well. Let's consider some typical English stopword lists.
You'd think a pure count of frequency, how often the word occurs, would be enough to make a common group of words "stop words", but apparently not everyone agrees. The default SQL Server stop word list is much larger than the Oracle stop word list. What makes "many" a stop word to Microsoft, but not to Oracle? Who knows. And I'm not even going to show the MySQL full text search stop word list here, because it's enormous, easily double the size of the SQL Server stop word list. These are just the default stop word lists; that doesn't mean you're stuck with them. You can edit the stop word list for any of these databases. Depending on what you're searching, you might decide to have different stop words entirely, or maybe no stop words at all. Way back in 2004, I ran a little experiment with Google -- over a period of a week, I searched for an entire dictionary of ~110k individual English words and recorded how many hits Google returned for each. Yes, this is probably a massive violation of the Google terms of service, but I tried to keep it polite and low impact -- I used Gzip compressed HTTP requests, specified only 10 search results should be returned per query (as all I needed was the count of hits), and I added a healthy delay between queries so I wasn't querying too rapidly. I'm not sure this kind of experiment would fly against today's Google, but it worked in 2004. At any rate, I ended up with a MySQL database of 110,000 English words and their frequency in Google as of late summer 2004. Here are the top results:
Again, a very different list than what we saw from SQL Server or Oracle. I'm not sure why the results are so strikingly different. Also, the web (or at least Google's index of the web) is much bigger now than it was in 2004; a search for "the" returns 13.4 billion results -- that's 25 times larger than my 2004 result of 522 million. On Stack Overflow, we warn users via an AJAX callback when they enter a title composed entirely of stop words. It's hard to imagine a good title consisting solely of stopwords, but maybe that's just because our technology stack isn't sufficiently advanced yet. Google doesn't seem to use stop words any more, as you can see from this search for "to be or not to be". Indeed, I wonder if classic search stop words are relevant in modern computing; perhaps they're a relic of early 90's computing that we haven't quite left behind yet. We have server farms and computers perfectly capable of handling the extremely large result sets from querying common English words. A Google patent filed in 2004 and granted in 2008 seems to argue against the use of stop words. Sometimes words and phrases that might be considered stopwords or stop-phrases may actually be meaningful or important. For example, the word "the" in the phrase "the matrix" could be considered a stopword, but someone searching for the term may be looking for information about the movie "The Matrix" instead of trying to find information about mathematical information contained in a table of rows and columns (a matrix). Apparently, at least to Google, stop word warnings are a thing of the past.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





