Friday 10 June 2016

Preporations for the Lightcurve Refactor Project

Before starting the LightCurve Refactor project I did a bit of research on the problem, both to make my application stand out and to give me a bit of a head start.
It seems only sensible that I present what I found.

Issues To Fix

The old LightCurve class suffers from 4 major functionality issues:

Redundant Code

Being made before UniDown and with a wide variety file formats for multiple instruments being added over time, there is a lot of now redundant code that is to be removed.

Inconsistent Interface

Each instrument gets its data from different sources and often with different filetypes that aren’t always distinguishable. Likewise the query requirements for some of these data resources often require different parameters. This has led to a rather inconsistent interface, where you have to manually call a different constructor for each instrument, i.e:
  >>> lc = LYRALightCurve.create(LYRA_filepath)
  >>> lc = RHESSISummaryLightCurve.create(RHESSI_filepath)
Where-as all maps are all created using the same map factory:
  >>> map = Map(any_filepath)
This approach should be replicated in the new datatype.

Inability to Concatenate

Often users need to evaluate data with a time range spanning multiple files, the LightCurve class can’t work with this.

Astropy Unit Incompatible

While the rest of the SunPy library has moved to support Astropy Units, which enable easy comparison of data with different units, the LightCurve class doesn’t have any support.

Lack of Functionality

At the moment, despite the underlaying Pandas DataFrames supporting robust data selection and manipulation functionality (like filtering times, resampling and such), the LightCurve lacks all of this.


The Debate - Implementing on AstroPy Tables

The LightCurve class is based on the Python Pandas DataFrame class, this class is designed to store, manipulate and visualise time series date. But this is both another dependency for SunPy (Pandas aren’t used elsewhere) and not being specifically science based the possible future support for anything similar to AstroPy Units is very unlikely.
The suggestion was made to use AstroPy Tables for storing the data, this seemed ideal, as they support Units natively, can support more advanced AstroPy Time datastructures and WCS coordinates.
But, upon investigation there was a major floor with the AstroPy tables, essentially to support AstroPy Time you need a Table with Mixing Column, and seeing as Time objects are immutable (simply an implementation decision) then this stops you from being able to re-order or append to a column containing Times. This basically meant that re-arranging or adding to a table means completely recreating it, at best memory and processor intensive and at worst simply impractical.
There is an outstanding request for a TimeSeries datatype based on AstroPy Tables:
    github.com/Cadair/astropy-APEs/blob/master/APE9.rst
But for now the decision was made to keep the implantation using Pandas.

This gets you up to speed on the problem and the early work I did.

1 comment:

  1. Really nice described the current issues with lightcurve - and this only doesn't happen in sunpy lightcurves, but in many code around the world... That's where design patterns come so useful!! :)

    ReplyDelete