An old post mentioned how algorithms are being used to help with diagnoses, but they’re also being used to write articles. Wired just published an article about Narrative Science, a company that creates programs which can generate articles. They mainly publish articles on finance and sports, since those are well-structured topics and very repetitive. You don’t normally have too many things happening in a baseball game or stock chart, so I could see why a computer could write such articles. They also have begun generating articles with other data:
Once Narrative Science had mastered the art of telling sports and finance stories, the company realized that it could produce much more than journalism. Indeed, anyone who needed to translate and explain large sets of data could benefit from its services. Requests poured in from people who were buried in spreadsheets and charts. It turned out that those people would pay to convert all that confusing information into a couple of readable paragraphs that hit the key points.
This is obviously quite impressive, though I wonder if all that information is necessarily better in paragraph form. Why can’t some of the data just be turned into some nice charts that people can view? Articles can sometimes “textualize” the content and make it take longer for the reader/viewer to get the necessary information. They can turn tables into paragraphs, but will they be able to soon write more general articles?
You know how annoying it is when you paste something into Gmail and it keeps the weird formatting? I finally pulled out the Google and discovered that in Chrome you can just press Ctrl-Shift-V to paste something just as text.
If you already pasted the text into Gmail and you want to remove formatting, click the “remove formatting” button on the top right (though it won’t change the font). (Hat-tip:Lifehacker)
Google’s been in trouble lately for being too successful, so I thought it would be worth examining how Google improved search and what quality issues have arisen.
Historical background In the 1990’s the Internet changed the world by providing a way for every type of information to be accessed from anywhere. Yet, this plethora of information would have been impossible to use without technology capable of sifting through it to find what the user needed. Many search engines tried different techniques to rank search results, but none of them did a satisfactory job. In 1998, Google entered the scene. While Sergey Brin and Larry Page were graduate students at Stanford, they developed a new way to rank search results and created Google.com. However, Google soon became a distraction from their important work pursing a PhD, so they tried selling it for $1 million, but no one wanted to buy it. Sadly, they were forced to drop out of school to focus on their company. By 2007, their company was worth more than $200 billion, but they still had no PhD’s. How Google Works Brin and Page realized that there was something important on the web that let them rank pages – hyperlinks. The links were simple ways for users to get from one page to another, but they realized the links could also be considered votes. The more incoming links a page had, the higher it would score in their algorithms, PageRank. They also factored in the importance of the linking pages, and the number of outgoing links each page had. This algorithm was finally able to rank pages in a meaningful way, and so Google became quite popular. Google also took into account the frequency and placement of different words, and has added many other factors into its analysis, but PageRank still plays an important role in their rankings. Issues with Google’s Algorithm An issue immediately arose with Google’s algorithm. A site could accrue many incoming links for two reasons: 1) It was a high quality sit, so many websites linked to it. 2) It spent a lot of effort and money into getting people to link to it so as to boost its PageRank. The problem with PageRank is that it was too easy to be gamed. As Google became more popular, this became an ever-increasing problem. If only a few people used Google, it wouldn’t be worth the website owners’ efforts to try to game it. But once Google became so big, a site’s entire existence was defined by their rank on Google. Websites had to put a lot of effort into using the right keywords and getting lots of links, and many low-quality sites succeeded at this. Google tried to counter such moves with more advanced algorithms, but there was always a constant cat-and-mouse game between the websites and Google.
In the past year, this issue has been raised in many blogs and newspapers. The New York Times published three articles on websites gaming Google searches (The Eyelasses Crook, JCpenney and Online Flowers Stores). Many topics on Google were returning low-quality sites derided as “content farms” like eHow.com and Associated Content as the top results. While these sites were sometimes useful, they often just stated the obvious without adding any real insight to the topic at hand. The content on the sites is created entirely so as to appear on Google results. Algorithms generate topics and titles based on Google searches and low-paid content writers quickly pump out articles to match the titles. Google’s algorithms were being beaten by content algorithms. Like in quantum mechanics, Google was unable to remain an external obsevrer, their very act of looking changed the nature of the web. Google responded to these criticisms with some algorithm updates, but with mixed results. It overall seems to have improved their quality, but low-quality sites still frequently come up high in the results. The problem has been especially problematic in certain high-expense advertising areas. Many sites that are mostly filled with affiliate links (where they get paid for each referall they send) have managed to reach the top of Google’s results with their “reviews” of the topics. For example, try searching for informative reviews of web hosts, online colleges or penny auctions. Many sites will pretend to give honest reviews without any disclaimer, but really just contain empty positive reviews so as to get you to click on their affiliate links. On occasion, they will hide an honest disclaimer, such as this top-ranking site for the search “penny auction scam”: “The website www.pennyauctionsreview.org receives compensation for the reviews posted on the site. All the reviews on this site are meant for entertainment purposes only and shouldn’t be taken as fact. Many of the reviews are in fact sent in via email from other people. PennyAuctionsReview.org does not own any of the trademarks used on this website. The typical person does not win anything by using penny auction sites.”
Few visitors will end up reading their disclaimer, but I’m not sure this is the type of content someone investigating penny auctions is looking for. There are also searches where the top Google results have returned incoherent combinations of keywords without any meaning in English: “With Purchase in every click advertising You could get Suited internet websites visitors into your Company net site The time and effort Your online Marketing is active.” Other Issues with Google Google still does not understand many types of queries, so even with a good algorithm for ranking, it won’t know what to search for. For example, if you type in a slightly complicated question, Google will often return irrelevant results. Often you will want to focus on a specific category or group of sites, but that is difficult to do on Google. There are times when the link-based algorithmic approach may not be able to tell which site will be most useful to an actual user. Sometimes certain sites with overall good content always make it to the top of Google, even when some of their pages do not deserve to be there or are not the best for the user. For example, Wikipedia is on the top of Google for many types of searches. Yet often Wikipedia might not be written clearly enough for the average user who is not familiar with the topic. Alternatives & the Future of Search Google is constantly testing and improving their search, but there are also alternatives to Google. Bing is the only big competitor to Google, since they also power Yahoo’s search. Many of their results are similar to Google’s but they have more features for certain categories. Blekko is a recent entry to search which uses human editors along with their algorithms, and tries to only display results from high-quality websites. They also allow users to search specific categories of user-created topics. Google and Bing both recently added the ability to see friends recommendations in search results, and they might eventually allow user votes to affect global search results.
There are also other tools a person can use to find information and websites besides a general search engine. Delicious.com and Xmarks lets you search through other people’s bookmarks. The old-fashioned directory, such as the wiki-style Open Directory can occasionally be useful to explore a certain category. If you have a question, Ask.com’s search engine is now focused on finding if anyone has asked it before. You can also try going directly to high-quality Q&A sites that cover your area, such as Quora or, for some topics, StackExchange.
One of Google’s weakest areas also happens to be the source of much of their revenue. Google makes 99% of their revenue from advertising, but especially in certain categories. They get most of their billions from advertising in commercial categories, especially certain ones such as insurance, loans, and law. Yet these results are frequently low-quality. People are looking for help with major purchases, they need a site that will either solve their issue for them or point them to the right site consistently. If a company is able to help crack such an issue, they may be able to grab a piece of the billions in advertising money that is available.
In the meantime, users can find other sites to help them with their purchases. There are many sites that help the user find the best deal within a specific area, such as the shopping comparison search Nextag, the compare-and-sort site FindtheBest, the review site ConsumerSearch or the bill-analysis site BillShrink. With sites such as these, the user can narrow in on the best deal at the best price.