Advisory Published

GHSA-h9j7-5xvc-qhg5: SSRF

First published: Mon Feb 26 2024(Updated: )

With the following crawler configuration: ```python from bs4 import BeautifulSoup as Soup url = "https://example.com" loader = RecursiveUrlLoader( url=url, max_depth=2, extractor=lambda x: Soup(x, "html.parser").text ) docs = loader.load() ``` An attacker in control of the contents of `https://example.com` could place a malicious HTML file in there with links like "https://example.completely.different/my_file.html" and the crawler would proceed to download that file as well even though `prevent_outside=True`. https://github.com/langchain-ai/langchain/blob/bf0b3cc0b5ade1fb95a5b1b6fa260e99064c2e22/libs/community/langchain_community/document_loaders/recursive_url_loader.py#L51-L51 Resolved in https://github.com/langchain-ai/langchain/pull/15559

Affected SoftwareAffected VersionHow to fix

Never miss a vulnerability like this again

Sign up to SecAlerts for real-time vulnerability data matched to your software, aggregated from hundreds of sources.


SecAlerts Pty Ltd.
132 Wickham Terrace
Fortitude Valley,
QLD 4006, Australia
By using SecAlerts services, you agree to our services end-user license agreement. This website is safeguarded by reCAPTCHA and governed by the Google Privacy Policy and Terms of Service. All names, logos, and brands of products are owned by their respective owners, and any usage of these names, logos, and brands for identification purposes only does not imply endorsement. If you possess any content that requires removal, please get in touch with us.
© 2024 SecAlerts Pty Ltd.
ABN: 70 645 966 203, ACN: 645 966 203