Our client develops the legal profession’s most innovative products for legal analytics and research, using the latest in visualization and text mining. Lawyers use these products to forecast how judges will rule, find critical cases, and make data-driven decisions. We're working with great technology and the challenges that get us fired up involve large-scale heterogeneous text
and data mining, beautiful UI, and machine learning. .
This position will assist with performance enhancements on its production system. The current ingestion process has an involved pipeline that goes through many steps and takes about 1 week of processing to complete. The client needs this process to take significantly less time.
Some of the challenges are:
The amount of work done in the pipeline and this work is serialized. Perhaps there is an opportunity for streaming or parallel processing
-
- AWS cost/spend is always a concern
- There may be some event processing in the system, but there is a belief that more processing would be better. SQS or Kafka would be potential fits here.
- There is a graphing API that is lightweight, and it needs optimized as well. Mostly around the processing. This is a containerized application written in Scala.
What you will be doing:
-
- Build and scale data infrastructure that powers real-time data processing of billions of records in a streaming architecture
- Build scalable data ingestion and machine learning inference pipelines
- Build a general-purpose API to deliver data science outputs to multiple business units
- Provide visibility into the health of our data platform (comprehensive view of data flow, resources usage, data lineage, etc) and optimize cloud costs
- Automate and handle the life-cycle of the systems and platforms that process our data
QUALIFICATIONS: