DataFrame class
#include <DataFrame.h>

The DataFrame is a matrix where each row corresponds to a data entry and each column of that row representing a characteristic of that data entry.

Constructors, destructors, conversion operators

DataFrame()
Constructs and empty DataFrame with no column names.
DataFrame(std::vector<std::string> colNames)
Constructs an empty DataFrame with the specified column names.
DataFrame(std::string filename)
Constructs a DataFrame from a file. The format must be a comma separated list, with each row corresponding to a data entry, or DataFrame row. The first row contains the column names.
DataFrame(std::vector<std::vector<Generic*>>* data)
Constructs a DataFrame with no column names containing the passed-in data in a 2D vector.
~DataFrame()
Destructor, deletes the outer vector of DataFrame.

Public functions

void sort(int sortindex)
Sorts this DataFrame on the specified column index.
auto slice(int startIndex, int endIndex) const -> DataFrame*
Returns a new DataFrame* containing all rows of this DataFrame, from the start index to end index, exclusive.
auto filter(std::string condition) const -> DataFrame*
Returns a new DataFrame* containing all rows where the specified condition is met. The format is "COLUMN_NAME < VALUE", where < can be any comparator operator (<, <=, >, >=, ==, !=).
void appendRow(std::vector<Generic*> row)
Appends a row to the DataFrame.
void appendRows(DataFrame* other)
Appends all rows of the passed-in DataFrame* to the end of this DataFrame.
void set(Generic* generic, int row, int col)
Sets the value at the passed-in row and column.
auto get(int row, int col) const -> Generic*
Returns the value at the passed-in row and column.
auto getRow(int row) const -> std::vector<Generic*>
Returns row at the specified index.
auto getColName(int colIndex) const -> std::string
Returns the column name at the specified column index.
auto getColNames() const -> std::vector<std::string>
Returns a vector containing all column names.
auto getColType(int colIndex) const -> GenericType
Returns the GenericType of data on the specified column.
auto getColTypes() const -> std::vector<GenericType>
Returns a vector containing the GenericType of data in each column.
auto rows() const -> int
Counts the number of rows of this DataFrame.
auto cols() const -> int
Counts the number of columns of this DataFrame.
auto average(int col) const -> double
Returns the average of all values in the specified row.

Function documentation

DataFrame::DataFrame(std::vector<std::string> colNames)

Constructs an empty DataFrame with the specified column names.

Parameters
colNames a vector containing the names of the DataFrame's columns

DataFrame::DataFrame(std::string filename)

Constructs a DataFrame from a file. The format must be a comma separated list, with each row corresponding to a data entry, or DataFrame row. The first row contains the column names.

Parameters
filename the name of the file to read

DataFrame::DataFrame(std::vector<std::vector<Generic*>>* data)

Constructs a DataFrame with no column names containing the passed-in data in a 2D vector.

Parameters
data the DataFrame data
Exceptions
invalid_argument if passed a nullptr

void DataFrame::sort(int sortindex)

Sorts this DataFrame on the specified column index.

Parameters
sortindex the column index to sort on
Exceptions
out_of_range if the passed-in column is negative or greater than the maximum column index

DataFrame* DataFrame::slice(int startIndex, int endIndex) const

Returns a new DataFrame* containing all rows of this DataFrame, from the start index to end index, exclusive.

Parameters
startIndex the starting row index
endIndex the ending row index, exclusive
Returns DataFrame* containing the specified rows of this DataFrame
Exceptions
out_of_range if start index is greater than end index, or either index is negative or greater than maximum row index

DataFrame* DataFrame::filter(std::string condition) const

Returns a new DataFrame* containing all rows where the specified condition is met. The format is "COLUMN_NAME < VALUE", where < can be any comparator operator (<, <=, >, >=, ==, !=).

Parameters
condition the condition to filter
Returns DataFrame* a new DataFrame containing the filtered rows
Exceptions
domain_error if the DataFrame is empty
invalid_argument if an invalid condition is given

void DataFrame::appendRow(std::vector<Generic*> row)

Appends a row to the DataFrame.

Parameters
row a vector to append as a row to the DataFrame data.
Exceptions
invalid_argument if the column count for this DataFrame and the the row to append are not the same

void DataFrame::appendRows(DataFrame* other)

Appends all rows of the passed-in DataFrame* to the end of this DataFrame.

Parameters
other the DataFrame to append to this one
Exceptions
invalid_argument if passed a nullptr or the DataFrames do not contain the same number of columns.

void DataFrame::set(Generic* generic, int row, int col)

Sets the value at the passed-in row and column.

Parameters
generic the value to set
row the row to set the value
col the column to set the value
Exceptions
out_of_range if the row or column is out of bounds

Generic* DataFrame::get(int row, int col) const

Returns the value at the passed-in row and column.

Parameters
row the row
col the column
Returns Generic* the value at the passed-in row and column
Exceptions
out_of_range if the row or column is out of bounds

std::vector<Generic*> DataFrame::getRow(int row) const

Returns row at the specified index.

Parameters
row the index of the row
Returns std::vector<Generic*> the row
Exceptions
out_of_range if the row is out of bounds

std::string DataFrame::getColName(int colIndex) const

Returns the column name at the specified column index.

Parameters
colIndex the index of the column
Returns std::string the column's name
Exceptions
domain_error if no column names are stored for this DataFrame
out_of_range if the column index is out of bounds

std::vector<std::string> DataFrame::getColNames() const

Returns a vector containing all column names.

Returns std::vector<std::string> the column names

GenericType DataFrame::getColType(int colIndex) const

Returns the GenericType of data on the specified column.

Parameters
colIndex the column index
Returns GenericType the type stored at the column
Exceptions
range_error if DataFrame is empty
out_of_range if column index is out of bounds

std::vector<GenericType> DataFrame::getColTypes() const

Returns a vector containing the GenericType of data in each column.

Returns std::vector<GenericType> the column types
Exceptions
range_error if DataFrame is empty

int DataFrame::rows() const

Counts the number of rows of this DataFrame.

Returns int the row count

int DataFrame::cols() const

Counts the number of columns of this DataFrame.

Returns int the column count

double DataFrame::average(int col) const

Returns the average of all values in the specified row.

Parameters
col the index of the column
Returns double the average
Exceptions
domain_error if DataFrame is empty
out_of_range if column index is out of bounds
domain_error if the column does not contain double values