Using only PHP to save Google Starred items to Pocket!

So, based on my last post, I wanted to see if I could do everything with PHP. After a bit of google-fu and using the php.net manual, I’ve managed this beauty. Use at your own risk! This works with my Google Starred items, and you still have to obtain the starred.json file from the Google Take Out service (See my last post for more information).

If you have any tips on how to improve this, drop me a comment!

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Instapaper: Export</title>
</head>
<body>
<h1>Unread</h1>
<ol>
<?php

// First off, we start by opening the file required (starred.json),
// Then we set the $galbool paramater (This is used where sites have
// given a "Gallery" URL (To make it more cosmetic, it appends the
// text [Gallery] to the end of the description.

$file_handle = fopen("starred.json", "r");
while (!feof($file_handle)) {
    $galbool = FALSE;
    $line = fgets($file_handle);

// This is our first check. We run through the json file and look for
// lines that contain the text "href". If it does not have that text,
// we are not interested, and set that line to be blank.

    $preg = "(\"href\")";
    $urlcheck = preg_match($preg, $line);
    if ($urlcheck !== 1) {
    $line = "";
    } else {

// A cheeky little bit of coding. Whislt we are in the loop, and
// I know that this is a URL we are intersted in, I'll have a look
// at the last character. If its a ",", I also want to delete that
// line. Looking at the JSON file, if a line contains a URL and
// ends with a ",", it means its not the *ACTUAL* URL we want, so
// we continue our ruthless streak and set that line to blank!
// (This was included ot deal with hackaday.com URL's, which for
// some reason doubled up, and this was a quick and easy way to
// get rid of them!

        $preg = "(,)";
        $clean = preg_match($preg, $line);
        if ($clean !== 0) {
            $line = "";
        }
        }

// Now we trim the whitespace and other non-needed characters, and
// we remove the first bit from the string thats not needed. This
// takes us right to the http:// part of the link, which is what we
// need! We also remove the trailing slash from the link as well.

        trim($line);
        $line = substr($line, 16);
        $line = substr_replace($line, "", -2);
        $string = $line;

        $check = $string[strlen($string)-2];
        if ( $check == "/"){
            $string = rtrim($string);
            $string = rtrim($string, "/");
            $desc = $string;
        }
        $check = $string[strlen($string)-1];
        if ( $check == "/"){
            $string = rtrim($string);
            $string = rtrim($string, "/");
            $desc = $string;
        }

// This is just a quick check to see if the URL passed is a gallery
// URL. If so, we set the $galbool value to true, and then do our
// usual URL cleanup. I have removed the /gallery part from the URL
// This is personal preferance, and I've not had any adverse effects
// from either taking it in, or removing it. It has to be removed
// just now to make figuring out the link text easier though.
// We can add it back in later if required.

        $gallerycheck = str_replace("/gallery", "", $string, $count);
        if ($count == 1){
            $galbool = TRUE;
            $string = rtrim($string);
            $string = rtrim($string, "/gallery");
            $desc = $string;
        }

// And now for the (almost) finale! We take everything after the
// forward slash in the URL, remove that forward slash, then we
// run through and replace every "-" with a space. This makes the
// end HTML page look nice, and it keeps with Instapapers Export
// option. If the $galbool value is true, we create a [Gallery]
// tag.

        $desc = strrchr($string, "/");
        $desc = str_replace("/", "", $desc);
        $desc = str_replace("-", " ", $desc);
        $desc = ucwords($desc);
        if ($string != "" ){
        if ($galbool == TRUE){

// you can add back in the /gallery link here again if you need it!
// Just uncomment the relevant line and comment out the other!
//        -------------------------------------------------------------------------------------------------

//        $formatted = '            <li><a href="' . $string . '/gallery">' . $desc . '[Gallery]</a></li>';

//        ----------------------------------***OR THIS LINE***---------------------------------------------

        $formatted = '            <li><a href="' . $string . '">' . $desc . '[Gallery]</a></li>';

//        -------------------------------------------------------------------------------------------------

        } else {
        $formatted = '            <li><a href="' . $string . '">' . $desc . '</a></li>';
        }
        echo $formatted;
        }
    }

// We now close the file (good housekeeping), and finish up
// the script.

fclose($file_handle);
?>
</ol>
</body>
</html>

 

Using the terminal & PHP to save Google Reader Starred items!

*MAJOR UPDATE* – USING PHP ONLY WITH NEW CODE! Please view this next post for more information!

So. Google Reader is closing down. I’m not going to get all high and mighty – It’s Google’s product. They do with it as they wish! There are several places that let you save your feeds from Google Reader, but I wanted to add all my Starred Items from Reader into Pocket. It turns out it wasn’t the easiest thing to do! After a few terminal commands and some PHP, this is what I came up with!

Firstly, head to Google Reader, and use the Data Takeout feature that Google provides to save your Reader Data only. The outputted ZIP file should contain a folder entitled “Reader”. Within that, there should be a file named “starred.json”.

*UPDATE* – I have saved these 2 files to GitHub! Go, Grab!
https://github.com/nickwebcouk/pocketimport

Now the fun begins! To make it easy (and quick) I used the Terminal on Mac and ran these commands within the above folder. I used two files (new.txt and newnew.txt) just to keep track of what was happening. There are easier ways of doing this! I ran the following commands from terminal:

grep -a1 "canonical" starred.json > new.txt
grep -v "^\--" new.txt > newnew.txt
grep -v "} ]," newnew.txt > new.txt
grep -v "\"canonical\" : \[ {" new.txt > newnew.txt
grep -v "\"updated" newnew.txt > new.txt
grep -v "} \]," new.txt > newnew.txt
cat newnew.txt | rev | cut -c 2- | rev > new.txt
cut -c 17- new.txt > newnew.txt
rm -r new.txt
mv newnew.txt url.txt

The “grep” command looks through text files for specific expressions. “cat” outputs a full file, “rev” reverses items, “cut” cuts text, “rm” removes files and “mv” moves files.

To make this easier, I created a Shell Script (Tested on OSX 10.8.2 only)

#!/bin/sh

clear
grep -a1 "canonical" starred.json > new.txt
grep -v "^\--" new.txt > newnew.txt
grep -v "} ]," newnew.txt > new.txt
grep -v "\"canonical\" : \[ {" new.txt > newnew.txt
grep -v "\"updated" newnew.txt > new.txt
grep -v "} \]," new.txt > newnew.txt
cat newnew.txt | rev | cut -c 2- | rev > new.txt
cut -c 17- new.txt > newnew.txt
rm -r new.txt
mv newnew.txt url.txt

You can run that file by saving it to the same location as the Google Reader folder, and running the following in terminal first:

chmod 755 script.sh
./script.sh

the first line tells the computer to allow script.sh to be executed, and the second line executes the script.

I then moved the url.txt file that had just been created to my www root folder (for me its under /users/~name/sites/), and created the following PHP/HTML code:

<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Instapaper: Export</title>
</head>
<body>
<h1>Unread</h1>
<ol>
<?php
    $file_handle = fopen("url.txt", "r");
    while (!feof($file_handle)) {
        $galbool = FALSE;
        $line = fgets($file_handle);
        $string = $line;
        $check = $string[strlen($string)-2];
        if ( $check == "/"){
            $string = rtrim($string);
            $string = rtrim($string, "/");
            $desc = $string;
        }
        $gallerycheck = str_replace("/gallery", "", $string, $count);
        if ($count == 1){
            $galbool = TRUE;
            $string = rtrim($string);
            $string = rtrim($string, "/gallery");
            $desc = $string;
        }
        $desc = strrchr($string, "/");
        $desc = str_replace("/", "", $desc);
        $desc = str_replace("-", " ", $desc);
        $desc = ucwords($desc);
        if ($galbool == TRUE){
        $formatted = '			<li><a href="' . $string . '">' . $desc . '[Gallery]</a></li>';
        } else {
        $formatted = '			<li><a href="' . $string . '">' . $desc . '</a></li>';
        }
        echo $formatted;
    }
    fclose($file_handle);
?>
</ol>
</body>
</html>

This provided me with a HTML page, which if saved as instapaper-export.html allowed me to head to getpocket.com and use the Instapaper import option.

584 starred articles and 2 seconds later, I received this wonderful little message!

getpocket import