APS Accelerator and Q4R4 Chunking with Claude

Vacation time, accelerator time, and chunking TBC for LLM RAG:

APS Accelerator Barcelona in September

Time to plan not just your summer vacation, but what follows after as well.

Registration is open for the next European face-to-face APS accelerator, scheduled to take place in the Autodesk offices on the beach in Barcelona September 23-27.

Very attractive and popular, all of the items above: APS itself, the accelerators, the office, the beach, and the location Barcelona. Address and directions: Autodesk SA, Carrer de Josep Pla 2, Torre B2, Planta 6, 08019 Barcelona, Spain

In the accelerator, you will benefit from dedicated time to develop your own chosen Autodesk Platform Services application with direct live help and training from my APS engineering expert colleagues to help creative developers leverage the Autodesk Platform Services Cloud APIs, i.e., your choice of the following APS APIs:

Last but not least, of course, we cover the desktop .NET Revit API as well as the APS Design Automation APIs for 3ds Max, AutoCAD, Inventor and Revit.

The deadline for submitting your proposal is Friday, September 13, 2024:

APS accelerator Barcelona

Q4R4 with LLM and RAG

I spent some time in 2017 pondering Q4R4, Question Answering for Revit API, a Revit API question answering system.

That was before the advent and rapid advancement of LLMs in more recent years.

Now, it is probably much simpler to achieve a better solution making use of the new technologies.

Some useful sources for priming an LLM with Revit API knowledge might be:

Some of that material could be fed in directly from the sources; other parts might need scraping from the web.

One useful approach to integrate this Revit API domain-specific data with a base LLM is RAG, retrieval-augmented generation.

So, for instance, I would like to prepare The Building Coder blog post sources for RAG, cf.:

Claude.ai Helped Chunk TBC Blog Posts

I asked Claude to chunk The Building Coder blog posts for LLM RAG with the following series of prompts:

The script generated 696 json files, one for each blog posts from number 1351 to today's number 2046

The result looks perfect. I corrected nothing whatsoever, didn't even look at the code generated. All I did was type in the input and output folder paths.

The earlier blog posts until number 1350 were written in HTML, so they require a different script for chunking. I went on to ask how to process those using the following prompts:

After that, all was well, all 2046 blog posts processed and chunked.

If you are interested in seeing the code produced by Claude and the blog post chunks generated, you can check it out in my tbcchunk GitHub repository.

Vacation

I am on vacation next week, on a bike tour (my first) in the Massiv Central in France.

So, you and the Revit API discussion forum will be left to your own devices for a while.

Rags