DIY Doom Index – How it’s made!

DIY Doom index is a project I made while at The Flatiron School’s web development immersive. DIY Doom Index allows users to sift through daily political, economic and environmental datasets to build a personalized Doom index. Similar to how the S&P 500 or Dow Jones stock indices aggregate multiple companies’ stock performance into one number, DIY doom index aggregates a number of different “pro-doom” and “anti-doom” metrics into one over-all doom number. Users build a single index that tells them how much better or worse the world is everyday based on their political, economic, and environmental sensitivities. You can use the app here (It’s hosted for free on Heroku and may take a few mins to load, try refreshing if you get an error). The code is on my GitHub and I made a video demo you can checkout. In this series of posts I’m going to go over how I made DIY doom index and suggest some areas for improvement. In this post I’m going to cover how I calculated the index to give each user a personalized experience.

Calculating the index

All of the API calls and index calculations happen in the Ruby on Rails backend. This helps avoid CORS related errors and allows me to quickly load updates as the user adjust their doom preferences.

Model

My ActiveRecord / Postgres model was comprised of Users, User-datasets (a join class), and Datasets.

Users (diy_doomsday_backend/app/models/user.rb):

class User < ApplicationRecord

  # Adds methods to set and authenticate against a Bcrypt password
  # requires "password_digest" in user schema/migration
  has_secure_password

  # sets up relationships
  has_many :user_datasets
  has_many :datasets, through: :user_datasets

  #!!!  add additonal validation

end

This class is fairly simple, it sets up the relationships the other database tables and invokes has_secure_password which utilizes the bcrypt gem and allows my backend to store user authentication details without saving plain text passwords to my database (more on authentication later).  The other classes are even simpler. I just set up the relationships and didn’t do any real validation in the backend. This is certainly an area where the app could be improved. When building the front end, I structured the various requests and forms to validate most of the data before sending it to the backend. These classes could be reworked to make the API more resilient, secure and usable with other front end apps.

User-Datasets (diy_doomsday_backend/app/models/user_dataset.rb):

class UserDataset < ApplicationRecord

  # sets up relationships
  has_many :users
  has_many :datasets
end

Datasets (diy_doomsday_backend/app/models/dataset.rb):

class Dataset < ApplicationRecord

  # sets up relationships
  has_many :user_datasets
  has_many :users, through: :user_datasets

end

Controller

I have controllers for each of my model classes as well as a sessions controller for logged in users and a data_requests_controller where most of the data is accessed. The model class controllers are mostly there for debugging purposes and expanded functionality (some user_dataset requests are made as users adjust their index). Since I wanted to access all of the models simultaneously in order to graph the user’s Doom Index I created the new route and controller  data_requests_controller to keep all of my concerns separated. When a user loads the DIY doom index page they expect to see the current index based on their preferences as well as 30 day historical chart. This also needs to be responsive to changes that the user might make from within the app. This is why the data requests controller has a lot of work to do. When a user makes a request this controller action needs to identify the user, find their preferences, update all the datasets from their individual API’s, calculate a new 30 day index and then render that index as JSON. I’ll go over how that works step by step.

    # find the current user by ID
    @user = User.find(params["id"])

    #find the current users data prefrences
    @userDatasets = UserDataset.where("user_id = #{@user.id}")

    #find all datasets that the user has interacted with
    @datasets = @userDatasets.map{ |user_data| Dataset.where("id = #{user_data.dataset_id}")[0]}

This block here selects the current user based on the request parameters (the user ID) sent from the frontend. Once it knows who’s data we’re working with it finds all of the dataset relationships they have. This includes weights and whether the user rates it positively or negatively in terms of its correlation with doom. Then we find all of the datasets that the user has enabled. Now that we’ve identified the current user, found all of their preferences and selected the datasets we need to process, we calculate the index.

The first step in this process is to get a hash filled with all the data we’re going to process.

dataPackage = getDataSets

 
  def getDataSets
    # routes all datasets to the appropreate request and formatting funcntion
    @datasets.map{ |dataset|

      if Time.now - dataset.updated_at > 21600 || !dataset.notes
        puts "REFRESHING DATA"
        if dataset.name == "Presidential Approval Rating"
          data = proccesPresApproval(dataset.srcAddress)
        elsif dataset.name == "Voter Preference for Democratic Candidates"
          data = proccesGenericBallot(dataset.srcAddress)
        elsif dataset.name == "Stock Market Volatility"
          data = proccesFRED(dataset.srcAddress)
        elsif dataset.name == "Polar Ice Growth"
          data = proccessSeaIce(dataset.srcAddress)
        elsif dataset.name == "Yeild Curve"
          data = proccesFRED(dataset.srcAddress)
        elsif dataset.name == "Inflation Rate"
          data = proccesFRED(dataset.srcAddress)
        else
          puts "INCORRECT DATASET NAME"
        end
        dataForPackage = {data:data, mean:getMean(data), stdDev:getStdVar(data), name:dataset.name, id:dataset.id}
        dataset.notes = dataForPackage.to_json
        dataset.save
        dataset.touch
      else
        dataForPackage = JSON.parse(dataset.notes)
      end
      dataForPackage.symbolize_keys
     }
  end

We run the getDataSets function and store its output as “dataPackage”. We’ll use this later. For now lets go through what getDataSets does. First it uses the .map function built into ruby to go through each dataset that the user has interacted with. Before doing anything else it checks to see if the data stored in the backend is older than 6 hours (or non-existant) if Time.now - dataset.updated_at > 21600 || !dataset.notes. If the data is old it will then run through a series of if else statements until it finds the data set that needs to be updated. Once the dataset is identified it sets the “data” variable based on what a dataset specific method returns. Since each dataset has a unique source and format a separate method needs to be used for each so that it can be cleaned and standardized for further use. Each of the Dataset functions calls a 3rd party API, parses the response, and stores the data as a hash. We can go through the proccesFRED function as an example.

 

  def proccesFRED(dataset)
    #requests a JSON dataset from FRED given a source and formats for doom index proccessing
    uri = URI.parse(dataset)
    response = Net::HTTP.get_response(uri)
    valueArr = JSON response.body
    data = {}
    valueArr["observations"].each{ |dataInstance|
      if dataInstance["value"] != "."
       data[DateTime.parse(dataInstance["date"]).strftime("%d/%m/%Y")] = dataInstance["value"]
      end
    }
    data
  end

The proccesFRED function takes in a URL for the dataset we are looking for, this was stored in our model and seeded when the app was launched. It makes the request to the API URL and then loops through the response. As it loops though it is parsing the date format from our data source and reformatting it to use a standard time format of my choosing. It sets the reformatted date as a key in a hash called “data” and then selects the individual piece of data that corresponds with the timestamp and sets that as the hash value. In this case, due to FRED’s particular formatting we want to eliminate all dates with a null value. This is represented in FRED as a “.” so we check for those in an if statement and only save dates that have a data value associated with them.

Once we’ve gone through all the dates that FRED sent us we return the data hash looking something like this data = {24/05/1992:123,  25/05/1992:124, ... }. All of the various API request functions I wrote have the same format for returning data but use different methods to make sure it is “clean” before returning it. In this case removing null values marked with “.” was our only concern.

Now that we have our updated dataset from FRED we return to the getDataSets function (see above). What getDataSets then does is take the “data” hash from our API request function and packages that into another hash under the unsurprisingly titled key “data” dataForPackage = {data:data, mean:getMean(data), stdDev:getStdVar(data), name:dataset.name, id:dataset.id}. In addition this “dataForPackage” hash gets the average of all the data points in this dataset with getMean(data) and the standard deviation with getStdVar(data). These values will be used later and are the result of methods I wrote using basic math operations (see github for these methods). Once we have this nice data package with clean data, the average and Standard deviation we want to save it in our database so that we don’t have to make this request and recalculate averages every time someone wants to get their index.

dataset.notes = dataForPackage.to_json

dataset.save

dataset.touch

These lines update the database to include the latest data in JSON format and makes sure the dataset.updated_at value is set using dataset.touch (I had some issues when the data was old but still unchanged. “touch” solved this). At this point if we hadn’t updated our dataset it would be recalled from our database and returned in the same format as our updated data.

Now that we’ve gone through all of that we have a variable dataPackage that contains an array of all the datasets the user wants in a nice format. Now we build our Index values.

 

    index = get31days.map{ |day|
      indexValueSet = dataPackage.map{ |dataset|
        dataRelationship = @userDatasets.find{ |userdataset| userdataset.dataset_id == dataset[:id]}

        relationshipVector = (dataRelationship.positive_corral ? 1 : -1)
        indexValuePartial = 0

        #if this day has a value in the dataset calculate the # of standard deviations from the mean it is, mulitply by the weight and multiply by -1 if the indicator is negatively related to doom --- else seek a value from the previous 10 days
        if dataset[:data][day]
          indexValuePartial = (((dataset[:data][day].to_f - dataset[:mean]) / dataset[:stdDev]) * dataRelationship.weight) * relationshipVector
        else
          (1..10).each { |index|
             if dataset[:data][(DateTime.parse(day) - index).strftime("%d/%m/%Y")]
               indexValuePartial =(((dataset[:data][(DateTime.parse(day) - index).strftime("%d/%m/%Y")].to_f - dataset[:mean]) / dataset[:stdDev]) * dataRelationship.weight) * relationshipVector
               break
            end
          }
        end
        indexValuePartial
       }
       #calcualte weighted average of proccessed values and add basedoom to determine this days index value
       {day=>(((indexValueSet.inject(0, :+))/getTotalWeights) + baseDoom)}
     }

     #send back 30 days of caluclated doom values
    render json: index

We start off building our index by looping over an array of the past 31 days. This array of days is generated by a method I wrote that just creates an array of the past 31 dates. For each date in that array we loop through all of the datasets in dataPackage . For each dataset we get the weight that the user has set for it dataRelationship = @userDatasets.find{ |userdataset| userdataset.dataset_id == dataset[:id]} and the polarity of the dataset (wether its considered a positive or negative indicator) relationshipVector = (dataRelationship.positive_corral ? 1 : -1). Then we check if the day from our original loop is available in the dataset if dataset[:data][day]. Assuming we have that date in our current dataset we calculate a z-score (in bold below) for that date/datapoint. This value represents how far from average the data point is. The philosophy here is that Doom represents both bad and unusual circumstances so a datapoint very far from average in a pro-doom direction will greatly increase the doom index whereas a datapoint that is close to average will have little effect. Once we’ve gotten the z-score we multiply that by the weight and polarity the user has selected indexValuePartial = (((dataset[:data][day].to_f - dataset[:mean]) / dataset[:stdDev]) * dataRelationship.weight) * relationshipVector. If we didn’t have the date in our dataset we will look back up to 10 days and simply use the most recent value (1..10).each { |index|
if dataset[:data][(DateTime.parse(day) - index).strftime("%d/%m/%Y")]
indexValuePartial =(((dataset[:data][(DateTime.parse(day) - index).strftime("%d/%m/%Y")].to_f - dataset[:mean]) / dataset[:stdDev]) * dataRelationship.weight) * relationshipVector
break
end
. This ensures that our doom index has no holes even on weekends or holidays, even if it means its just the same as the previous day.

Once we’ve calculated the weighted z-score for this specific day and for this specific dataset we then add it together with all the other datasets for this day, average them and then add a “basedoom” value for UI reasons I’ll explain later (it basically lifts the index off zero) {day=>(((indexValueSet.inject(0, :+))/getTotalWeights) + baseDoom)}. Once we have 31 days of values we render all of these as JSON and the frontend takes over.

Thats it! Hope you liked this post. Send me an Email if you have questions or comments.

Leave a Reply

Your email address will not be published.