CME

Cloudwatch event based async data processing that handled 100G of data a night with some individual files upwards of 30G (which introduced unique challenges due to the sheer size of the files). The solution to processing such large files that I implemented was:

S3 put event -> lambda -> container in fargate that shells out and pipes aws cli into sed to drastically reduce the raw file size -> s3 put event -> lambda uses request lib to post s3 location and schema to 3rd party normalization tool -> s3 put event -> lambda submits normalized data location and schema to druid for indexing

Google Big Query

Worked directly with head of product, architected and built a fleet of cloudwatch scheduled lambdas (all with CI/CD stacks and SDLC) that would intermittently query Google Big Query to ingest Ethereum and Bitcoin chain data. Required learning how GBW indexes their data to optimize the queries and balance cost vs latency. Data was then transformed in the lambda using pandas over several steps (raw -> flattened -> normalized) and written to S3 in a hive style path to then be ingested be internal tools (athena, druid, kafka)

Node scraper

Using "serverless framework" architected and built a fleet of cloudwatch scheduled lambdas (all with CI/CD stacks and SDLC) that provided the Research Analysts (who could write some python but were not full fledge developers) with a platform where they could query a given ethereum node for arbitrary data, perform their arbitrary transformations and then would accept a well formatted CSV. The platform would then take the CSV and write it to S3 in a hive style path to then be ingested be internal tools (athena, druid, kafka)

CME

Google Big Query

Node scraper

Celery