15 December 2018 / INFOSEC

HOW TO Tell a Story Tweet by Tweet using Python

I got curious the other day about whether it would be tricky to parse a long text into tweet-form and automatically publish it to Twitter every hour. The answer turned out to be ‘no, not particularly’, but this write-up is for those who want to do something similar, or who are merely curious about how an afternoon project like this takes form. The source code for the project presented here is also available on GitHub

The end result of this afternoon project is the program tweetBooks, which consist of these 5 Python files:

- downloadBookByURL.py  (downloads text)
- fitTextInTweets.py    (parses text) 
- publishFromCron.py    (publish tweet using sys.argv)
- publishToTwitter.py   (publish tweet interactively)
- tweetBooks.py         (a command-line interface for tweetBooks)

Step 1: Download a text

The first step is to download a book (or other text). I chose to make a function that can do this. The downloadByURL function in downloadBooksByURL.py takes two arguments: a one-word project name and the URL that it should download the text from. It then uses the Requests package to download this file. This downloaded text will be given a name that is based on the project name and saved to disk. Note that ‘unparsed’ will be part of the file name, because the text that was downloaded must be parsed before it can be tweeted.

The debug function isn’t used here, but is present because I use a Python script to… create other Python scripts based on a template, and the ever-useful debug function is part of that template.

#!/usr/bin/python

####
##
## downloadBookByURL.py: downloads a text and saves it to file
## Created by: Fredrik Walloe
## Creation date: 14-12-2018
## Version: 1.0
## Status: works as expected with no known errors, but there are some static values that should be set by sys.argv 
##
## Usage: use tweetBooks.py -d to download a file 
##
####

#### IMPORTS ####
import requests
import os

#### VARIABLES ####
debugMode = False

#### FUNCTIONS ####

# prints text passed to it when debugMode is set to True
def debug(text):
   if debugMode is True:
       print(text)

#### MAIN ####
def downloadByURL(projectName, URL):
    url = URL
    response = requests.get(url)

    fileName = "unparsed" + projectName + "/" + projectName + ".txt"

    # create the path if neccessary
    os.makedirs(os.path.dirname(fileName), exist_ok=True)

    # place the project in a separate folder
    with open(fileName, "w+") as f:
        f.write(response.text)

Step 2: Parse the text

Each tweet should be a maximum of 140 characters long. Tweets shold also be numbered (to make it easier to read the text in order) and a hashtag.

The fitTextToTweet function takes care of this. The user must supply two arguments: the same project name that was used to for the downloadByURL function and a hashtag. The project name will be automatically converted to file paths as needed. Say we downloaded Moby Dick using the downloadByURL function and gave it the project name MobyDick. The text will then have been saved to MobyDick/unparsedMobyDick.txt. When the user passes the same project name to fitTextToTweet through tweetBooks.py, the project name will be used to recreate the file path. This happens in tweetBooks.py, which means fitTextToTweet receives a file path.

A try-catch at the start of the function will ensure that the program exits if the unprased text file cannot be found; if this happens it likely means that the user either hasn’t downloaded a file yet, or provided the wrong project name to tweetBooks.py.

Processed tweets will be saved in a list and then written to a file once the whole text has been parsed. For very long files it may make more sense to append each tweet to file when it has been processes rather than keep everything stored in memory, but that should not be a problem for books of a normal length.

In order to parse the tweets, the function will go through the text line by line. Some minor parsing is done to each line to get rid of multiple consecutive spaces. But the central task of this function is to checks whether a line can fit into a tweet. It first counts the number of characters in the line, and then sees whether that number is larger than the characters that can fit in a tweet. Note that this number is less than 140, because the tweet will also contain a hashtag, a tweet count, the formatting around the tweet count (3 characters) and two spaces (2 characters).

As an example, after being processed by this function the first line of Alice in Wonderland would look something like this:

Alice was beginning to get very tired of sitting by her sister on the bank (1/) #Wonderland

If the line can fit in a tweet without modifications the function checks whether it is less than 60 characters (short lines can be annoying too).

Lines that are neither too long nor too short to fit in a tweet will be added to the list of tweets after the tweet number and hashtag has been added to the end of the line.

If a line is too long, the last word will be chopped of and stored in the lastLine variable. The function will then check if the line minus the last word can fit in the tweet. If it still cannot fit, another word is chopped off the end… Likewise, if a line is too short, the whole line is added to lastLine. The function then proceeds to the next line. Before it begins to parse that line, the words in lastLine are added to the start of the line. And then the function continues, checking whether the line fits into a tweet and chopping words off as necessary.

When all the lines have been processed, the parsed tweets are saved to a separate file. This file has the same name as the unparsed file, only without ‘unparsed’ as part of the filename.

#!/usr/bin/python

####
##
## fitTextInTweets.py: parse a text into tweets
## Created by: Fredrik Walloe
## Creation date: 14-12-2018
## Version: 1.0
## Status: works as expected with no known errors
##
## Usage: run tweetBooks.py --fit and follow the instructions
##
####

#### IMPORTS ####
import sys

#### VARIABLES ####
debugMode = False

#### FUNCTIONS ####

# prints text passed to it when debugMode is set to True
def debug(text):
   if debugMode is True:
       print(text)

def fitTextToTweet(filePath, hashtag):

    try:   
        # the unparsed text was downloaded to a file located in projectName/unparsedProjectName
        text = open(filePath.replace("/", "/unparsed"), "r")       
    except:
        print("Unable to open project file. Make sure you specified an existing project\
        or use the --download option to create a new project. Project names are case-sensitive")
        sys.exit(1)

    tweets = []             # save tweets to a list
    i = 1                   # the tweet number
    lastLine = ""           # when a line must be cut down to fit into a tweet, the cut words are stored in this variable
        
    # loop through the text and make each line fit into a tweet
    for line in text:
   
        # if the line is not empty
        if str(line).strip():

            line = lastLine + " " + line # text that didn't fit in the last tweet should be part of the next tweet
     
            lastLine = ""
            line = line.replace("\n", "").strip()     # weird spacing doesn't work in tweet-form        
            line = line.replace("  ", " ") # if mashing up lastLine and line together introduces and extra space
            line = line.replace("   ", " ") # if mashing up lastLine and line together introduces and extra space
            line = line.replace("    ", " ") # if mashing up lastLine and line together introduces and extra space

            # if it's longer than 120 characters it's neccessary to manipulate it to make it fit a tweet
            if len(line) > 120:
             

                # to figure out how many characters we can fit into a tweet we must take the max tweet length of 140 characters, 
                # minus the hashtag, tweet number and the three charactes used to encapsulate the tweet number
                while len(line) > (140 - len(hashtag) - len(str(3)) - len(str(i))):    
                    
                    # Save the last word
                    lastLine = line.rsplit(' ', 1)[1] + " " + lastLine    

                    # keep the line minus the last word
                    line = line.rsplit(' ', 1)[0]

            elif str("the end") in line.lower():
                tweets.append(line + " (" + str(i) + "/) #" + hashtag + "\n")
                

            # very short sentences can be annoying too
            elif len(line) < 60:
                lastLine = line

            # can fit in a tweet as-is
            else:
                debug("Tweet ready: " + line + " (" + str(i) + "/) #" + hashtag)
                tweets.append(line + " (" + str(i) + "/) #" + hashtag + "\n")
                i += 1
                   

    for tweet in tweets:
        print(tweet.replace("\n", ""))


    with open(filePath, "w+") as f:
        for tweet in tweets:
            f.write(tweet)

Step 3: Tweet

Once the text has been processed, it’s time to post the tweets on Twitter. This can be done with a cron job, manually from the command line or through the tweetBooks.py command-line interface.

For this to work, it is necessary to create a Buffer account and then to create an application in Buffer. Next, all the API values must be hard-coded into the correct variables in this function. Alternatively, the Twitter API can be used, but this may become problematic if you want to create a separate Twitter account for your project as the Twitter API requires a phone number that you may have already registered to another account. This may be a non-issue, but I opted for Buffer as it also makes it possible to post to other social media platforms.

The publishToTwitter function takes a single argument, which is the project name that we’ve used before. As previously mentioned each tweet was assigned a tweet number during the parsing stage. If we want to publish the story in the correct order, we should start with tweet number 1 and then proceed with number 2 and so on. We keep track of this information by writing the number of a last tweet that was published to a file named lineReached in the project folder. This file will be created automatically the first time the publishToTwitter function runs. The first thing the function does is read from this file, find the tweet number, add 1 to that number (because we’re looking for the next tweet to publish) and then save that number to a variable.

It then begins to loop through the file that contains the parsed tweets. For each line it will use regular expression to find the tweet number and then compare it to the tweet number we saved to the variable. When these numbers match we’ve found the next tweet that should be published. The functions saves the tweet to a variable.

Now that it has the tweet, it sends the tweet to Buffer (using its API), which in turn publishes it.

Because the ‘now’ attribute is set in the POST request to Buffer, the tweet will be published immediately, but it is also possible to schedule the tweet for publication at a later time.

After the tweet has been published, we update the lastReached file with the number of the tweet we just published. However, this only happens if the attempt to send the tweet to Buffer gets the response code 200, which indicates it was successful. Otherwise, no information is written to the file. This ensures that when the publishToTwitter functions runs again, the next tweet will be published.

Lastly, we write a message to a log: if our attempt to publish the tweet got the response code 200, the message ‘published tweet successfully’ will be written to the file, prepended by the date and time. A successful publication should look something like 15/12/2018 14:53:24 -> published tweet successfully in the log. The log is a quick way to check when a problem occurred if you discover that a cron job has failed.

#!/usr/bin/python

####
##
## publishToTwitter.py: gets a prepared tweet from file, logs into a Twitter account, publishes the tweet and then writes the tweet number to file
## Created by: Fredrik Walloe
## Creation date: 14-12-2018
## Version: 1.0
## Status: works as expected with no known errors
##
## Usage: after running downloadBookByURL.py and fitTextInTweets.py, publishToTwitter.py can be used to publish the tweets to twitter
##
####

#### IMPORTS ####
import requests         # used to send tweets to Buffer, which then forwards the tweets to Twitter 
import re               # used to find the tweet number
import datetime         # used when writing to a log that keeps track of whether publications have been successful. 
import sys              # used to quit if the file that should be read from cannot be found

#### VARIABLES ####
debugMode = False
lineReached = 0
tweet = ""

clientID = ""                       # Buffer client ID, which can be found under 'registered apps' after you create an application
clientSecret = ""                   # Buffer client Secret, which will be sent by email after you create an application
redirectURI = ""                    # Buffer redirect, which can be found under 'registered apps' after you create an application; a default redirect was used here
bufferToken = ""                    # Buffer Token, which can be found under 'registered apps' after you create an application         
bufferProfileID = ""                # Can be found in the URL when you navigate to your Buffer profile

#### FUNCTIONS d####

# prints text passed to it when debugMode is set to True
def debug(text):
   if debugMode is True:
       print(text)

def publishToTwitter(projectName):
    # As the script is intended to be run hourly / daily it makes sense to save the progress to file
    getLineReached = open(projectName + "/lineReached", "r")

    for line in getLineReached:
        lineReached = line.strip().replace("\n", "")

    # Tweets are retrieved from this file 
    try:
        getTweet = open(projectName + "/" + projectName + ".txt", "r")
    except: 
        print("Failed to open file that should contain the parsed tweets for this project. If you have not yet used to\
         --download and --fit options, use those first and then try this option again. If yout've already run those\
         options, verify that you entered the correct project name (note that project names are case-sensitive.")
        sys.exit(1)

    # loop through each of the prepared tweets 
    for line in getTweet:
        if line.strip():

            # Each tweet is numbered in the format (somenumber/).  
            tweetNumberPattern = r'(\(\d*\/\))'
            tweetNumber = re.findall(tweetNumberPattern, line)
            tweetNumber = str(tweetNumber).split("/)")[0].split("(")[-1].strip()
            
            # find the 
            if int(lineReached) == int(tweetNumber):
                debug(line)
                tweet = line
                break

    """
        Buffer requires you to a) create an account b) create an application, c) get a token (requires a separate 
        GET request if you need more than one) and d) the information below supplied as data in the POST request: 
        - a profile ID (found in the URL on your Buffer profile)
        - the text that should be shared; a tweet in this case
        - 'now' is optional and means that the tweet will be shared immediately
        - client ID, which can also be found after creating an 
    """
    data = {"profile_ids": bufferProfileID, "text": tweet, "now": 'now', "client_id": clientID,
                "client_secret": clientSecret,
                "redirect_uri": redirectURI,
                "access_token": bufferToken
    }

    # buffer also requires that the content-type is set like this
    headers = {"Content-Type": "application/x-www-form-urlencoded"}



    # connect to Buffer and publish the tweet
    response = requests.post('https://api.bufferapp.com/1/updates/create.json', headers = headers, data = data)

    # change debugMode to True to print these
    debug(response)
    debug(response.text)
    debug(response.reason)
    debug(response.status_code)
    debug(response.cookies)

    # write the last line to file
    with open(projectName + "/lineReached", "w") as f:
        # only update lineReached if the attempt to publish the tweet succeeded 
        if str(200) in str(response.status_code):
            # because the tweet was published the last reach tweet value should increment by one
            lineReached = int(lineReached) + 1                
            f.write(str(lineReached))


    # keep a log of successful / unsuccesful attempts to tweet; can be used to figure out when a problem occurred. 
    with open("publicationLog.txt", "a") as f:
        now = datetime.datetime.now()
        print(response.status_code)
        if str(200) in str(response.status_code):
            f.write(str(now.day) + "/" + str(now.month) + "/" + str(now.year) + " " + str(now.hour) + 
            ":" + str(now.minute) + ":" + str(now.second) + " -> published tweet successfully\n")
        else:
            f.write(str(now.day) + "/" + str(now.month) + "/" + str(now.year) + " " + str(now.hour) + 
            ":" + str(now.minute) + ":" + str(now.second) + " -> failed to publish tweet\n")

Step 4: Providing a command-line interface

Argparse lets us create a command-line interface quickly, which can be useful for giving users a simple way to interact with a program. Rather than look for usage instructions in each file, they can get the inforamation they need – and access each feature – in one place.

Too keep things simple, I chose to use input to let the user provide project names. There is no error-checking here, but if the user enters the wrong project name the function that is called will exit with a more or less appropriate error message. Input from users is saved to variables and then passed to functions.

#!/usr/bin/python

####
##
## tweetBooks.py: a command-line interface for downloading a text, parsing it into tweet-form and publishing the tweets
## Created by: Fredrik Walloe
## Creation date: 15-12-2018
## Version: 1.0
## Status: works fine with no known errors
##
## Usage: run tweetBooks.py -h for an overview of the available features 
##
####

#### IMPORTS ####
import argparse
import sys
from downloadBookByURL import downloadByURL
from fitTextInTweets import fitTextToTweet
from publishToTwitter import  publishToTwitter

#### VARIABLES ####
debugMode = False

#### FUNCTIONS ####

# prints text passed to it when debugMode is set to True
def debug(text):
   if debugMode is True:
       print(text)

#### MAIN ####

parser = argparse.ArgumentParser(description='Publish a Text Tweet-by-Tweet')

parser.add_argument('-d', '--download', action='store_true', help = "Download a text")

parser.add_argument('-f', '--fit', action='store_true', help = "Parse a text to make it fit a tweet")

parser.add_argument('-p', '--publish', action='store_true', help = "Publish the next tweet")

args = parser.parse_args()



if (args.download):
    projectName = input("Choose a project name: ")
    URL = input("Provide URL that the text should be downloaded from: ")

    downloadByURL(projectName.strip(), URL.strip())

if (args.fit):
    projectName = input("Specify an existing project that should be parsed: ")
    hashtag = input("Choose a hashtag (without the #): ")

    filePath = projectName + "/" + projectName + ".txt"
    print(filePath)
    fitTextToTweet(filePath, hashtag)

if (args.publish):
    areCredentialsSet = input("Have you created a Buffer account and manually\
                        set Buffer credentials in publishToTwitter.py?\n1. Yes\n2. No\n") 
    if str("1") in areCredentialsSet:
        projectName = input("Specify the name of an existing project to publish the next pending tweet for that project: ")
        publishToTwitter(projectName)

    else: 
        print("Sort that out first and then come back here")
        sys.exit(1)

Step 5: Why not use cron?

The downside of an interactive command-line interface that relies on interactive user input is that this makes it trickier to run the program as a cron job.

As a simple fix to this, publishFromCron.py will run the publishToTwitter function with a project name passed as argument through sys.argv[].

This means that a Moby Dick project named MobyDick can be called from the command line like this:

./publishFromCron.py MobyDick

A cron job that runs every hour would look something like this:

0 * * * * fredrik /home/fredrik/letspretendthisiswhereikeepmyscripts/tweetBooks/publishFromCron.py Moby Dick

Only 4 lines of this scripts are relevant:

#!/usr/bin/python
import sys
from publishToTwitter import publishToTwitter
publishToTwitter(sys.argv[1])

But I publish it in full here in case this is read by someone who is unfamiliar with Python and wants to see the whole thing; partial code examples can be annoying when you’re just starting out.

#!/usr/bin/python

####
##
## publishFromCron.py: publish a project by specifying the project name as a sys.argv argument; cron-friendly
## Created by: Fredrik Walloe
## Creation date: 15-12-2018
## Version: 1
## Status: works fine without known errors
##
## Usage: ./publishFromCron.py someProjectNameInOneWord
##
####

#### IMPORTS ####
import sys
from publishToTwitter import  publishToTwitter

#### VARIABLES ####
debugMode = False

#### FUNCTIONS ####

# prints text passed to it when debugMode is set to True
def debug(text):
   if debugMode is True:
       print(text)


publishToTwitter(sys.argv[1])

Conclusion

Although tweetBooks is not a particularly complicated project it was nonetheless fun to put it together. I may return to it if I need to publish a long text on Twitter for some reason, but until then I hope this write-up can serve as a useful jump-off point for folks who want to use the Buffer API, or who are curious about Python and want a simple example of how it can be used to do practical things.